This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Last update: Jan 09, 2023

Related tags

Deep Learning ActionCLIP

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites
Data Preparation
Uodates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

We now support single crop validation(including zero-shot) on Kinetics-400, UCF101 and HMDB51. The pretrained models see MODEL_ZOO.md for more information.
we now support the model-training on Kinetics-400, UCF101 and HMDB51 on 8, 16 and 32 frames. The model-training configs see configs/README.md for more information.
We now support the model-training on your own datasets. The detail information see configs/README.md.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper).

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:8hg2
ViT-B/16	8	81.09%	95.49%	link
ViT-B/16	16	81.68%	95.87%	link
ViT-B/16	32	82.32%	96.20%	link pwd:v7nn

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_ft_tem.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/k400/k400_ft_tem_test.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_ft.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/ucf101/ucf_ft.yaml

More training details, you can find in configs/README.md

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please use the following BibTex entry for citation.

@inproceedings{wang2022ActionCLIP,
  title={ActionCLIP: A New Paradigm for Video Action Recognition},
  author={Mengmeng Wang, Jiazheng Xing and Yong Liu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Acknowledgments

Our code is based on CLIP and STM.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Related tags

Overview

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Overview

Content

Prerequisites

Data Preparation

Updates

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Contributors

Citing ActionCLIP

Acknowledgments

Owner

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

This repository compare a selfie with images from identity documents and response if the selfie match.

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Pose estimation with MoveNet Lightning

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

pyspark🍒🥭 is delicious，just eat it!😋😋

Progressive Coordinate Transforms for Monocular 3D Object Detection

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

reimpliment of DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

3D detection and tracking viewer (visualization) for kitti & waymo dataset

Binary classification for arrythmia detection with ECG datasets.

Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Contrastive Fact Verification

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

ML-Decoder: Scalable and Versatile Classification Head

Meta Self-learning for Multi-Source Domain Adaptation： A Benchmark