SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Last update: Nov 20, 2021

Related tags

Text Data & NLP SASE

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

We propose a SASE model with adaptive noise distribution, which achieves state of the art results on the VioceBank+DEMAND dataset.
We simulated the federated learning setting of a real environment and verified the robustness of the proposed SASE noise reduction model in a real environment through experiments and visualization.
The proposed SASE model is computed based on the complex domain, and the TF-GA block is used to extract richer information of speech distribution and noise distribution, while SA-GOEA and SA-GUEA are adaptive to learn the distribution mask of noise.
In this paper, we propose a model aggregation optimization weighting strategy that is more applicable to FLbased speech enhancement tasks.

Dependencies

python >=3.6 (3.8.5 was used in the experiments)
PyTorch == 1.10.0+cu113
flwr == 2.0.1

How to run the code

1. Prepare data

VoiceBank+DEMAND can be accessed from this [link](## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models)
CommonVoice(Chinese) link +Noise92 [link](NOISEX (cmu.edu))

2. Train on the VoiceBank+DEMAND dataset

python main.py

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

./run-server.sh
./run-client.sh
- You can change the number of clients by changing NUM_CLIENTS

4. Generate wav files and evaluate

python main.py -g --resume "model_file" -df "wavs_root"

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Related tags

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Dependencies

How to run the code

1. Prepare data

2. Train on the VoiceBank+DEMAND dataset

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

4. Generate wav files and evaluate

Result

1. Evaluate on VoiceBank+DEMAND dataset

2. Evaluate on CommonVoice+Noise92 dataset

Owner

Tower

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

A python framework to transform natural language questions to queries in a database query language.

profile tools for pytorch nn models

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

Scene Text Retrieval via Joint Text Detection and Similarity Learning

A music comments dataset, containing 39,051 comments for 27,384 songs.

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Paddlespeech Streaming ASR GUI

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Text Classification in Turkish Texts with Bert

A python gui program to generate reddit text to speech videos from the id of any post.