PyVideoAI: Action Recognition Framework

Last update: Dec 29, 2022

Related tags

Deep Learning PyVideoAI

Overview

This reposity contains official implementation of:

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition (Kim et al., 2022) Instruction

PyVideoAI: Action Recognition Framework

The only framework that completes your computer vision, action recognition research environment.

** Key features **

Supports multi-gpu, multi-node training.
STOA models such as I3D, Non-local, TSN, TRN, TSM, MVFNet, ..., and even ImageNet training!
Many datasets such as Kinetics-400, EPIC-Kitchens-55, Something-Something-V1/V2, HMDB-51, UCF-101, Diving48, CATER, ...
Supports both video decoding (straight from .avi/mp4) and frame extracted (.jpg/png) dataloaders, sparse-sample and dense-sample.
Any popular LR scheduling like Cosine Annealing with Warm Restart, Step LR, and Reduce LR on Plateau.
Early stopping when training doesn't improve (customise your condition)
Easily add custom model, optimiser, scheduler, loss and dataloader!
Telegram bot reporting experiment status.
TensorBoard reporting stats.
Colour logging
All of the above come with no extra setup. Trust me and try some examples.

** Papers implemented **

[Official] TC Reordering and GrayST (Kim et al. 2022).
ProSelfLC (CVPR 2021).

This package is motivated by PySlowFast from Facebook AI. The PySlowFast is a cool framework, but it depends too much on their config system and it was difficult to add new models (other codes) or reuse part of the modules from the framework.
This framework by Kiyoon, is designed to replace all the configuration systems to Python files, which enables easy-addition of custom models/LR scheduling/dataloader etc.
Just modify the function bodies in the config files!

Difference between the two config systems can be found in CONFIG_SYSTEM.md.

Getting Started

Jupyter Notebook examples to run:

HMDB-51 data preparation
Inference on pre-trained model from the model zoo, and visualise model/dataloader/per-class performance.
Training I3D using Kinetics pretrained model
Using image model and ImageNet dataset

is provided in the examples!

Structure

All of the executable files are in tools/.
dataset_configs/ directory configures datasets. For example, where is the dataset stored, number of classes, single-label or multi-label training, dataset-specific visualisation settings (confusion matrix has different output sizes)
model_configs/ directory configures model architectures. For example, model definition, input preprocessing mean/std.
exp_configs/ directory configures other training settings like optimiser, scheduling, dataloader, number of frames as input. The config file path has to be in exp_configs/[dataset_name]/[model_name]_[experiment_name].py format.

Usage

Preparing datasets

This package supports many action recognition datasets such as HMDB-51, EPIC-Kitchens-55, Something-Something-V1, CATER, etc.
Refer to DATASET.md.

Training command

CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/run_train.py -D {dataset_config_name} -M {model_config_name} -E {exp_config_name} --local_world_size {num_GPUs} -e {num_epochs}

--local_world_size denotes the number of GPUs per computing node.

Telegram Bot

You can preview experiment results using Telegram bots!

If your code raises an exception, it will report you too.

You can quickly take a look at example video inputs (as GIF or JPEGs) from the dataloader.
Use tools/visualisations/model_and_dataloader_visualiser.py

Talk to BotFather and make a bot.
Go to your bot and type anything (/start)
Find chat_id at https://api.telegram.org/bot{token}/getUpdates (without braces)
Add your token and chat_id to tools/key.ini.

[Telegram0]
token=
chat_id=

Model Zoo and Baselines

Refer to MODEL_ZOO.md

Installation

Refer to INSTALL.md.

TL;DR,

conda create -n videoai python=3.8
conda activate videoai
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.1 -c pytorch -c nvidia
 

git clone --recurse-submodules https://github.com/kiyoon/PyVideoAI.git
cd PyVideoAI
git checkout v0.3
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .

Experiment outputs

The experiment results (log, training stats, weights, tensorboard, plots, etc.) are saved to data/experiments by default. This can be huge, so make sure you make a softlink of a directory you really want to use. (recommended)

Otherwise, you can change pyvideoai/config.py's DEFAULT_EXPERIMENT_ROOT value. Or, you can also set --experiment_root argument manually when executing.

PyVideoAI: Action Recognition Framework

Related tags

Overview

PyVideoAI: Action Recognition Framework

Getting Started

Structure

Usage

Preparing datasets

Training command

Telegram Bot

Model Zoo and Baselines

Installation

Experiment outputs

Owner

Kiyoon Kim

Analysis of Smiles through reservoir sampling & RDkit

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

VQGAN+CLIP Colab Notebook with user-friendly interface.

SIEM Logstash parsing for more than hundred technologies

Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

basic tutorial on pytorch

Official implementation of the paper Momentum Capsule Networks (MoCapsNet)

On the model-based stochastic value gradient for continuous reinforcement learning

Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

A concise but complete implementation of CLIP with various experimental improvements from recent papers

The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

Laser device for neutralizing - mosquitoes, weeds and pests

Official repository of Semantic Image Matting

Latex code for making neural networks diagrams