WaveFake: A Data Set to Facilitate Audio DeepFake Detection

Related tags

Deep LearningWaveFake
Overview

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

logo

This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper WaveFake.

Deep generative modeling has the potential to cause significant harm to society. Recognizing this threat, a magnitude of research into detecting so-called "Deepfakes" has emerged. This research most often focuses on the image domain, while studies exploring generated audio signals have - so far - been neglected. In this paper, we aim to narrow this gap. We present a novel data set, for which we collected ten sample sets from six different network architectures, spanning two languages. We analyze the frequency statistics comprehensively, discovering subtle differences between the architectures, specifically among the higher frequencies. Additionally, to facilitate further development of detection methods, we implemented three different classifiers adopted from the signal processing community to give practitioners a baseline to compare against. In a first evaluation, we already discovered significant trade-offs between the different approaches. Neural network-based approaches performed better on average, but more traditional models proved to be more robust.

Dataset & Pre-trained Models

You can find our dataset on zenodo and we also provide pre-trained models.

Setup

You can install all needed dependencies by running:

pip install -r requirements.txt

RawNet2 Model

For consistency, we use the RawNet2 model provided by the ASVSpoof 2021 challenge. Please download the model specifications here and place it under dfadetect/models as raw_net2.py.

Statistics & Plots

To recreate the plots/statistics of the paper, use:

python statistics.py -h

usage: statistics.py [-h] [--amount AMOUNT] [--no-stats] [DATASETS ...]

positional arguments:
  DATASETS              Path to datasets. The first entry is assumed to be the referrence one. Specified as follows 
   
    

optional arguments:
  -h, --help            show this help message and exit
  --amount AMOUNT, -a AMOUNT
                        Amount of files to concider.
  --no-stats, -s        Do not compute stats, only plots.

   

Example

python statistics.py /path/to/reference/data,ReferenceDataName /path/to/generated/data,GeneratedDataName -a 10000

Training models

You can use the training script as follows:

python train_models.py -h

usage: train_models.py [-h] [--amount AMOUNT] [--clusters CLUSTERS] [--batch_size BATCH_SIZE] [--epochs EPOCHS] [--retraining RETRAINING] [--ckpt CKPT] [--use_em] [--raw_net] [--cuda] [--lfcc] [--debug] [--verbose] REAL FAKE

positional arguments:
  REAL                  Directory containing real data.
  FAKE                  Directory containing fake data.

optional arguments:
  -h, --help            show this help message and exit
  --amount AMOUNT, -a AMOUNT
                        Amount of files to load from each directory (default: None - all).
  --clusters CLUSTERS, -k CLUSTERS
                        The amount of clusters to learn (default: 128).
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size (default: 8).
  --epochs EPOCHS, -e EPOCHS
                        Epochs (default: 5).
  --retraining RETRAINING, -r RETRAINING
                        Retraining tries (default: 10).
  --ckpt CKPT           Checkpoint directory (default: trained_models).
  --use_em              Use EM version?
  --raw_net             Train raw net version?
  --cuda, -c            Use cuda?
  --lfcc, -l            Use LFCC instead of MFCC?
  --debug, -d           Only use minimal amount of files?
  --verbose, -v         Display debug information?

Example

To train all EM-GMMs use:

python train_models.py /data/LJSpeech-1.1/wavs /data/generated_audio -k 128 -v --use_em --epochs 100

Evaluation

For evaluation you can use the evaluate_models script:

python evaluate_models.p -h

usage: evaluate_models.py [-h] [--output OUTPUT] [--clusters CLUSTERS] [--amount AMOUNT] [--raw_net] [--debug] [--cuda] REAL FAKE MODELS

positional arguments:
  REAL                  Directory containing real data.
  FAKE                  Directory containing fake data.
  MODELS                Directory containing model checkpoints.

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output file name.
  --clusters CLUSTERS, -k CLUSTERS
                        The amount of clusters to learn (default: 128).
  --amount AMOUNT, -a AMOUNT
                        Amount of files to load from each directory (default: None - all).
  --raw_net, -r         RawNet models?
  --debug, -d           Only use minimal amount of files?
  --cuda, -c            Use cuda?

Example

python evaluate_models.py /data/LJSpeech-1.1/wavs /data/generated_audio trained_models/lfcc/em

Make sure to move the out-of-distribution models to a seperate directory first!

Attribution

We provide a script to attribute the GMM models:

python attribute.py -h

usage: attribute.py [-h] [--clusters CLUSTERS] [--steps STEPS] [--blur] FILE REAL_MODEL FAKE_MODEL

positional arguments:
  FILE                  Audio sample to attribute.
  REAL_MODEL            Real model to attribute.
  FAKE_MODEL            Fake Model to attribute.

optional arguments:
  -h, --help            show this help message and exit
  --clusters CLUSTERS, -k CLUSTERS
                        The amount of clusters to learn (default: 128).
  --steps STEPS, -m STEPS
                        Amount of steps for integrated gradients.
  --blur, -b            Compute BlurIG instead.

Example

python attribute.py /data/LJSpeech-1.1/wavs/LJ008-0217.wav path/to/real/model.pth path/to/fake/model.pth

BibTeX

When you cite our work feel free to use the following bibtex entry:

@inproceedings{
  frank2021wavefake,
  title={{WaveFake: A Data Set to Facilitate Audio Deepfake Detection}},
  author={Joel Frank and Lea Sch{\"o}nherr},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2021},
}
Owner
Chair for Sys­tems Se­cu­ri­ty
Chair for Sys­tems Se­cu­ri­ty
Transfer SemanticKITTI labeles into other dataset/sensor formats.

LiDAR-Transfer Transfer SemanticKITTI labeles into other dataset/sensor formats. Content Convert datasets (NUSCENES, FORD, NCLT) to KITTI format Minim

Photogrammetry & Robotics Bonn 64 Nov 21, 2022
AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AK-Shanmugananthan 1 Nov 29, 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li, Ali Zare, Nima Mesgarani We pres

Aaron (Yinghao) Li 282 Jan 01, 2023
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
Deep Residual Networks with 1K Layers

Deep Residual Networks with 1K Layers By Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Microsoft Research Asia (MSRA). Table of Contents Introduc

Kaiming He 856 Jan 06, 2023
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022
Highly comparative time-series analysis

〰️ hctsa 〰️ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu

Ben Fulcher 569 Dec 21, 2022
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Dec 31, 2022
CS5242_2021 - Neural Networks and Deep Learning, NUS CS5242, 2021

CS5242_2021 Neural Networks and Deep Learning, NUS CS5242, 2021 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : https:/

Xavier Bresson 165 Oct 25, 2022
Fiddle is a Python-first configuration library particularly well suited to ML applications.

Fiddle Fiddle is a Python-first configuration library particularly well suited to ML applications. Fiddle enables deep configurability of parameters i

Google 227 Dec 26, 2022
The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

Character Transformations for Non-Autoregressive GEC Tagging Milan Straka, Jakub Náplava, Jana Straková Charles University Faculty of Mathematics and

ÚFAL 5 Dec 10, 2022
In the AI for TSP competition we try to solve optimization problems using machine learning.

AI for TSP Competition Goal In the AI for TSP competition we try to solve optimization problems using machine learning. The competition will be hosted

Paulo da Costa 11 Nov 27, 2022
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021
Code for ViTAS_Vision Transformer Architecture Search

Vision Transformer Architecture Search This repository open source the code for ViTAS: Vision Transformer Architecture Search. ViTAS aims to search fo

46 Dec 17, 2022
A Factor Model for Persistence in Investment Manager Performance

Factor-Model-Manager-Performance A Factor Model for Persistence in Investment Manager Performance I apply methods and processes similar to those used

Omid Arhami 1 Dec 01, 2021
This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability.

Delayed-cellular-neural-network This project provides the proof of the uniqueness of the equilibrium and the global asymptotic stability. There is als

4 Apr 28, 2022
Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Improving Transferability of Representations via Augmentation-Aware Self-Supervision Accepted to NeurIPS 2021 TL;DR: Learning augmentation-aware infor

hankook 38 Sep 16, 2022
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 08, 2023