Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Last update: Jan 02, 2023

Overview

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

English | 中文

❗ Now we provide inferencing code and pre-training models. You could generate any text sounds you want.

⭐ The model training only uses the corpus of neutral emotion, and does not use any strongly emotional speech.

⭐ There are still great challenges in out-of-domain style transfer. Limited by the training corpus, it is difficult for the speaker-embedding or unsupervised style learning (like GST) methods to imitate the unseen data.

⭐ With the help of Unet network and AdaIN layer, our proposed algorithm has powerful speaker and style transfer capabilities.

Infer code or Colab notebook

Demo results

Paper link

😄 The authors are preparing simple, clear, and well-documented training process of Unet-TTS based on Aishell3. It contains:

MFA-based duration alignment
Multi-speaker TTS with speaker_embedding-Instance-Normalization, and this model provides pre-training Content Encoder.
Unet-TTS training
One-shot Voice cloning inference
C++ inference

Stay tuned!

Install Requirements

Install the appropriate TensorFlow and tensorflow-addons versions according to CUDA version.
The default is TensorFlow 2.6 and tensorflow-addons 0.14.0.

pip install TensorFlowTTS

Usage

see file UnetTTS_syn.py or notebook

CUDA_VISIBLE_DEVICES=0 python UnetTTS_syn.py

from UnetTTS_syn import UnetTTS

models_and_params = {"duration_param": "train/configs/unetts_duration.yaml",
                    "duration_model": "models/duration4k.h5",
                    "acous_param": "train/configs/unetts_acous.yaml",
                    "acous_model": "models/acous12k.h5",
                    "vocoder_param": "train/configs/multiband_melgan.yaml",
                    "vocoder_model": "models/vocoder800k.h5"}

feats_yaml = "train/configs/unetts_preprocess.yaml"

text2id_mapper = "models/unetts_mapper.json"

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)

#text: input text
#src_audio: reference audio
#dur_stat: phoneme duration statistis to contraol speed rate
syn_audio, _, _ = Tts_handel.one_shot_TTS(text, src_audio, dur_stat)

Reference

https://github.com/TensorSpeech/TensorFlowTTS

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Related tags

Overview

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Install Requirements

Usage

Reference

Owner

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Exploration of BERT-based models on twitter sentiment classifications

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

justCTF [*] 2020 challenges sources

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Repositório do trabalho de introdução a NLP

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Top2Vec is an algorithm for topic modeling and semantic search.

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Opal-lang - A WIP programming language based on Python

Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

🧪 Cutting-edge experimental spaCy components and features

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Baseline code for Korean open domain question answering(ODQA)