Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Last update: Dec 29, 2022

Overview

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Requirements:

Python 3.7+
Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
Python package requirements: cffi, numpy
Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.

Models:

Model	Download Size
Facebook Wav2Vec2 2.0 Base (960h)	360 MB
Facebook Wav2Vec2 2.0 Large (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h)	1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Comments

provide API for returning output from intermediate layers

It would be very helpful to have an API for returning output from intermediate layers, for example, the one before the final layers. This output can be used in other speech tasks other than speech recognition.

opened by zhouyong64 1

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

6 Oct 18, 2022

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

64 May 8, 2022

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

40 Sep 19, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Related tags

Overview

Wav2Vec2 STT Python

Usage

Installation/Building

Author

License

Acknowledgments

You might also like...

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Comments

provide API for returning output from intermediate layers

Releases(v0.2.0)

v0.2.0(Aug 16, 2021)

models(Aug 2, 2021)

Owner

David Zurow

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

jiant is an NLP toolkit

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Use the power of GPT3 to execute any function inside your programs just by giving some doctests

Semi-automated vocabulary generation from semantic vector models

Open Source Neural Machine Translation in PyTorch

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

The official repository of the ISBI 2022 KNIGHT Challenge

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

Shared code for training sentence embeddings with Flax / JAX

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

keras implement of transformers for humans

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark