Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Last update: Jan 09, 2023

Related tags

Overview

UniSpeech

The family of UniSpeech:

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model	Dataset	Model
UniSpeech Base	1500 hrs CommonVoice	download
UniSpeech Large	1500 hrs CommonVoice	download
UniSpeech-SAT Base	960 hrs LibriSpeech	download
UniSpeech-SAT Base+	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download
UniSpeech-SAT Large	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Related tags

Overview

UniSpeech

Pre-trained models

License

Contact Information

Owner

Microsoft

Image to Image translation, image generataton, few shot learning

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

A dataset for online Arabic calligraphy

Semi-supervised learning for object detection

Variational autoencoder for anime face reconstruction

CountDown to New Year and shoot fireworks

ICCV2021 - A New Journey from SDRTV to HDRTV.

Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

Models Supported: AlbUNet [18, 34, 50, 101, 152] (1D and 2D versions for Single and Multiclass Segmentation, Feature Extraction with supports for Deep Supervision and Guided Attention)

CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Rax is a Learning-to-Rank library written in JAX

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

A simple version for graphfpn

Cereal box identification in store shelves using computer vision and a single train image per model.

An implementation of an abstract algebra for music tones (pitches).

Numerical-computing-is-fun - Learning numerical computing with notebooks for all ages.

HGCN: Harmonic Gated Compensation Network For Speech Enhancement

C3D is a modified version of BVLC caffe to support 3D ConvNets.