Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Overview

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

arXiv paper

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

  • Slow-Fast (EPIC-KITCHENS-100) link
  • Slow (EPIC-KITCHENS-100) link
  • Fast (EPIC-KITCHENS-100) link
  • Slow-Fast (VGG-Sound) link
  • Slow (VGG-Sound) link
  • Fast (VGG-Sound) link

Preparation

  • Requirements:
    • PyTorch 1.7.1
    • librosa: conda install -c conda-forge librosa
    • h5py: conda install h5py
    • wandb: pip install wandb
    • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
    • simplejson: pip install simplejson
    • psutil: pip install psutil
    • tensorboard: pip install tensorboard
  • Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
  • VGG-Sound:
    1. Download the audio. For instructions see here
    2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
  • EPIC-KITCHENS:
    1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
    2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
    3. Extract audio from the videos by running:
    python audio_extraction/extract_audio.py /path/to/videos /output/path 
    
    1. Save audio in HDF5 format by running:
    python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
    

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations 

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Owner
Evangelos Kazakos
Evangelos Kazakos
A python library for working with praat, textgrids, time aligned audio transcripts, and audio files.

praatIO Questions? Comments? Feedback? A library for working with praat, time aligned audio transcripts, and audio files that comes with batteries inc

Tim 224 Dec 19, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 06, 2023
Gradient - A Python program designed to create a reactive and ambient music listening experience

Gradient is a Python program designed to create a reactive and ambient music listening experience.

Alexander Vega 2 Jan 24, 2022
An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio.

yt-dl (GUI Edition) An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio. How do I download this? Windows: Fi

1 Oct 23, 2021
controls volume using hand gestures

controls volume using hand gestures

1 Oct 11, 2021
Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5

Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5. It gives you the ability to play, pause, and Equalize any one-channel wav audio file and play 3 different instruments.

Mustafa Megahed 1 Jan 10, 2022
[Singing Log] Let your program learn to sing!

[Singing Log] Let your program learn to sing! You must have thought this was changelog when you saw the English title, but it's not, it's chΓ nggΔ“log. What it does is allow your program to print logs

黄巍 22 Sep 03, 2022
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
𝙰 π™Όπšžπšœπš’πšŒ π™±πš˜πš π™²πš›πšŽπšŠπšπšŽπš π™±πš’ πšƒπšŽπšŠπš–π™³πš•πš πŸ’–

TeamDltmusic 𝙰 π™Όπšžπšœπš’πšŒ π™±πš˜πš π™²πš›πšŽπšŠπšπšŽπš π™±πš’ πšƒπšŽπšŠπš–π™³πš•πš πŸ’– Deploy String Session String Click hear you can find string session OR join He

TeamDlt 5 Jan 18, 2022
Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

IELab@ Korea University 0 Nov 12, 2021
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
A Quick Music Player Made Fully in Python

Quick Music Player Made Fully In Python. Pure Python, cross platform, single function module with no dependencies for playing sounds. Installation & S

1 Dec 24, 2021
ianZiPu is a way to write notation for Guqin (叀琴) music.

PyBetween Wrapper for Between - λΉ„νŠΈμœˆμ„ μœ„ν•œ 파이썬 라이브러리 Legal Disclaimer 였직 ꡐ윑적 λͺ©μ μœΌλ‘œλ§Œ μ‚¬μš©ν• μˆ˜ 있으며, λΉ„νŠΈμœˆμ€ VCNC의 μžμ‚°μž…λ‹ˆλ‹€. μ•…μ˜μ  곡격에 μ΄μš©ν• μ‹œ 처벌 λ°›μ„μˆ˜ μžˆμŠ΅λ‹ˆλ‹€. μ‚¬μš©μ— λ”°λ₯Έ μ±…μž„μ€ μ‚¬μš©μžκ°€

Nancy Yi Liang 8 Nov 25, 2022
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

8.5k Jan 09, 2023
IDing the songs played on the do you radio show

IDing the songs played on the do you radio show

Rasmus Jones 36 Nov 15, 2022
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
Spotipy - Player de mΓΊsica simples em Python

Spotipy Player de mΓΊsica simples em Python, utilizando a biblioteca Pysimplegui para a interface grΓ‘fica. Este tocador Γ© bastante simples em si, mas p

Adelino Almeida 4 Feb 28, 2022
Bot Music Pintar. Created by Rio

🎢 Rio Music 🎢 Kalo Fork Star Ya Bang Hehehe Requirements πŸ“ FFmpeg NodeJS nodesource.com Python 3.8+ or 3.7 PyTgCalls Generate String Using Replit ‡

RioProjectX 7 Jun 15, 2022
α΄€ ʙᴏᴛ α΄›Κœα΄€α΄› α΄„α΄€Ι΄ α΄˜ΚŸα΄€Κ ᴍᴜꜱΙͺα΄„ ΙͺΙ΄ α΄›α΄‡ΚŸα΄‡Ι’Κ€α΄€α΄ Ι’Κ€α΄α΄œα΄˜ ᴏɴ ᴠᴏΙͺᴄᴇ α΄„α΄€ΚŸΚŸ

GJ516 LOVER'S Δ±Δ±llΔ±llΔ± β™₯️ βž€βƒGᴊ516_ᴍᴜꜱΙͺα΄„_ʙᴏᴛ β™₯️ Δ±llΔ±llΔ± α΄€ ʙᴏᴛ α΄›Κœα΄€α΄› α΄„α΄€Ι΄ α΄˜ΚŸα΄€Κ ᴍᴜꜱΙͺα΄„ ΙͺΙ΄ α΄›α΄‡ΚŸα΄‡Ι’Κ€α΄€α΄ Ι’Κ€α΄α΄œα΄˜ ᴏɴ ᴠᴏΙͺᴄᴇ α΄„α΄€ΚŸΚŸ Requirements πŸ“ FFmpeg NodeJS nodesou

1 Nov 22, 2021
FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

ADAT USB Audio Interface FPGA based USB 2.0 High Speed audio interface featuring multiple optical ADAT inputs and outputs Status / current limitations

Hans Baier 78 Dec 31, 2022