This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Related tags

Text Data & NLPLipGAN
Overview

LipGAN

Generate realistic talking faces for any human speech and face identity.

[Paper] | [Project Page] | [Demonstration Video]

image

Important Update:

A new, improved work that can produce significantly more accurate and natural results on moving talking face videos is available here: https://github.com/Rudrabha/Wav2Lip


Code without MATLAB dependency is now available in fully_pythonic branch. Note that the models in both the branches are not entirely identical and either one may perform better than the other in several cases. The model used at the time of the paper's publication is with the MATLAB dependency and this is the one that has been extensively tested. Please feel free to experiment with the fully_pythonic branch if you do not want to have the MATLAB dependency. A Google Colab notebook is also available for the fully_pythonic branch. [Credits: Kirill]


Features

  • Can handle in-the-wild face poses and expressions.
  • Can handle speech in any language and is robust to background noise.
  • Paste faces back into the original video with minimal/no artefacts --- can potentially correct lip sync errors in dubbed movies!
  • Complete multi-gpu training code, pre-trained models available.
  • Fast inference code to generate results from the pre-trained models

Prerequisites

  • Python >= 3.5
  • ffmpeg: sudo apt-get install ffmpeg
  • Matlab R2016a (for audio preprocessing, this dependency will be removed in later versions)
  • Install necessary packages using pip install -r requirements.txt
  • Install keras-contrib pip install git+https://www.github.com/keras-team/keras-contrib.git

Getting the weights

Download checkpoints of the folowing models into the logs/ folder

Generating talking face videos using pretrained models (Inference)

LipGAN takes speech features in the form of MFCCs and we need to preprocess our input audio file to get the MFCC features. We use the create_mat.m script to create .mat files for a given audio.

cd matlab
matlab -nodesktop
>> create_mat(input_wav_or_mp4_file, path_to_output.mat) # replace with file paths
>> exit
cd ..

Usage #1: Generating correct lip motion on a random talking face video

Here, we are given an audio input (as .mat MFCC features) and a video of an identity speaking something entirely different. LipGAN can synthesize the correct lip motion for the given audio and overlay it on the given video of the speaking identity (Example #1, #2 in the above image).

python batch_inference.py --checkpoint_path <saved_checkpoint> --face <random_input_video> --fps <fps_of_input_video> --audio <guiding_audio_wav_file> --mat <mat_file_from_above> --results_dir <folder_to_save_generated_video>

The generated result_voice.mp4 will contain the input video lip synced with the given input audio. Note that the FPS parameter is by default 25, make sure you set the FPS correctly for your own input video.

Usage #2: Generating talking video from a single face image

Refer to example #3 in the above picture. Given an audio, LipGAN generates a correct mouth shape (viseme) at each time-step and overlays it on the input image. The sequence of generated mouth shapes yields a talking face video.

python batch_inference.py --checkpoint_path <saved_checkpoint> --face <random_input_face> --audio <guiding_audio_wav_file> --mat <mat_file_from_above> --results_dir <folder_to_save_generated_video>

Please use the --pads argument to correct for inaccurate face detections such as not covering the chin region correctly. This can improve the results further.

More options

python batch_inference.py --help

Training LipGAN

We illustrate the training pipeline using the LRS2 dataset. Adapting for other datasets would involve small modifications to the code.

Preprocess the dataset

We need to do two things: (i) Save the MFCC features from the audio and (ii) extract and save the facial crops of each frame in the video.

LRS2 dataset folder structure
data_root (mvlrs_v1)
├── main, pretrain (we use only main folder in this work)
|	├── list of folders
|	│   ├── five-digit numbered video IDs ending with (.mp4)
Saving the MFCC features

We use MATLAB to save the MFCC files for all the videos present in the dataset. Refer to the fully_pythonic branch if you do not want to use MATLAB.

# Please copy the appropriate LRS2 train split's filelist.txt to the filelists/ folder. The example below is shown for LRS2.
cd matlab
matlab -nodesktop
>> preprocess_mat('../filelists/train.txt', 'mvlrs_v1/main/') # replace with appropriate file paths for other datasets.
>> exit
cd ..
Saving the Face Crops of all Video Frames

We preprocess the video files by detecting faces using a face detector from dlib.

# Please copy the appropriate LRS2 split's filelist.txt to the filelists/ folder. Example below is shown for LRS2. 
python preprocess.py --split [train|pretrain|val] --videos_data_root mvlrs_v1/ --final_data_root <folder_to_store_preprocessed_files>

### More options while preprocessing (like number of workers, image size etc.)
python preprocess.py --help
Final preprocessed folder structure
data_root (mvlrs_v1)
├── main, pretrain (we use only main folder in this work)
|	├── list of folders
|	│   ├── folders with five-digit video IDs 
|	│   |	 ├── 0.jpg, 1.jpg .... (extracted face crops of each frame)
|	│   |	 ├── 0.npz, 1.npz .... (mfcc features corresponding to each frame)

Train the generator only

As training LipGAN is computationally intensive, you can just train the generator alone for quick, decent results.

python train_unet.py --data_root <path_to_preprocessed_dataset>

### Extensive set of training options available. Please run and refer to:
python train_unet.py --help

Train LipGAN

python train.py --data_root <path_to_preprocessed_dataset>

### Extensive set of training options available. Please run and refer to:
python train.py --help

License and Citation

The software is licensed under the MIT License. Please cite the following paper if you have use this code:

@inproceedings{KR:2019:TAF:3343031.3351066,
  author = {K R, Prajwal and Mukhopadhyay, Rudrabha and Philip, Jerin and Jha, Abhishek and Namboodiri, Vinay and Jawahar, C V},
  title = {Towards Automatic Face-to-Face Translation},
  booktitle = {Proceedings of the 27th ACM International Conference on Multimedia}, 
  series = {MM '19}, 
  year = {2019},
  isbn = {978-1-4503-6889-6},
  location = {Nice, France},
   = {1428--1436},
  numpages = {9},
  url = {http://doi.acm.org/10.1145/3343031.3351066},
  doi = {10.1145/3343031.3351066},
  acmid = {3351066},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {cross-language talking face generation, lip synthesis, neural machine translation, speech to speech translation, translation systems, voice transfer},
}

Acknowledgements

Part of the MATLAB code is taken from the an implementation of the Talking Face Generation implementation. We thank the authors for releasing their code.

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Diaformer Diaformer: Automatic Diagnosis via Symptoms Sequence Generation (AAAI 2022) Diaformer is an efficient model for automatic diagnosis via symp

Junying Chen 20 Dec 13, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

44 Jan 06, 2023
MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

edesz 1 Jan 03, 2022
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
Natural language Understanding Toolkit

Natural language Understanding Toolkit TOC Requirements Installation Documentation CLSCL NER References Requirements To install nut you need: Python 2

Peter Prettenhofer 119 Oct 08, 2022
This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

Hassan Shahzad 3 Oct 15, 2021
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
Search-Engine - 📖 AI based search engine

Search Engine AI based search engine that was trained on 25000 samples, feel free to train on up to 1.2M sample from kaggle dataset, link below StackS

Vladislav Kruglikov 2 Nov 29, 2022
A framework for implementing federated learning

This is partly the reproduction of the paper of [Privacy-Preserving Federated Learning in Fog Computing](DOI: 10.1109/JIOT.2020.2987958. 2020)

DavidChen 46 Sep 23, 2022
C.J. Hutto 3.8k Dec 30, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
Stand-alone language identification system

langid.py readme Introduction langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained

2k Jan 04, 2023
p-tuning for few-shot NLU task

p-tuning_NLU Overview 这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发,也实现了个使用P-tuning进行NLU分类的任务, 思路是一样的,prompt实现方式有不同,这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后

3 Dec 29, 2022