AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Last update: Dec 28, 2022

Related tags

Deep Learning AdaSpeech2

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Unofficial Pytorch implementation of AdaSpeech 2.

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

Training :

[WIP]

Citations :

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

@misc{yan2021adaspeech,
      title={AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data}, 
      author={Yuzi Yan and Xu Tan and Bohan Li and Tao Qin and Sheng Zhao and Yuan Shen and Tie-Yan Liu},
      year={2021},
      eprint={2104.09715},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Requirements :

For Preprocessing :

Training :

Citations :

Owner

Rishikesh (ऋषिकेश)

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

U-Net implementation in PyTorch for FLAIR abnormality segmentation in brain MRI

STBP is a way to train SNN with datasets by Backward propagation.

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Computer Vision and Pattern Recognition, NUS CS4243, 2022

Offline Reinforcement Learning with Implicit Q-Learning

PyTorch implementation for Graph Contrastive Learning with Augmentations

Yas CRNN model training - Yet Another Genshin Impact Scanner

Conjugated Discrete Distributions for Distributional Reinforcement Learning (C2D)

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

This is the pytorch re-implementation of the IterNorm

Advances in Neural Information Processing Systems (NeurIPS), 2020.

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Lightweight mmm - Lightweight (Bayesian) Media Mix Model