Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Last update: Nov 07, 2022

Overview

EAN: Event Adaptive Network

PyTorch Implementation of paper:

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, and Zhiyong Gao

[ArXiv]

Main Contribution

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.

Content

Dependencies
Data Preparation
Pretrained Models
- Something-Something-V1
Testing
Training
Other Info

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

Extract frames from videos:
- For Something-Something-V2 dataset, please use data_process/vid2img_sthv2.py
- For Kinetics dataset, please use data_process/vid2img_kinetics.py
- For Diving48 dataset, please use data_process/extract_frames_diving48.py
Generate file lists needed for dataloader:
- Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:
```
video_frame_folder 100 10
video_2_frame_folder 150 31
...
```
- Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py
Edit dataset config information in datasets_video.py

Pretrained Models

Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model	Backbone	FLOPs	Val Top1	Val Top5	Checkpoints
EAN_8F(RGB+LMC)	ResNet-50	37G	53.4	81.1	[Jianguo Cloud]
EAN_16(RGB+LMC)		74G	54.7	82.3
EAN_{16+8(RGB+LMC)}		111G	57.2	83.9
EAN_{2 x (16+8)(RGB+LMC)}		222G	57.5	84.3

Testing

For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh

# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh

Training

We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:

@misc{tian2021ean,
      title={EAN: Event Adaptive Network for Enhanced Action Recognition}, 
      author={Yuan Tian and Yichao Yan and Xiongkuo Min and Guo Lu and Guangtao Zhai and Guodong Guo and Zhiyong Gao},
      year={2021},
      eprint={2107.10771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Related tags

Overview

EAN: Event Adaptive Network

Main Contribution

Content

Dependencies

Data Preparation

Pretrained Models

Something-Something-V1

Testing

Training

Other Info

References

Citation

Contact

Owner

TianYuan

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

adversarial_multi_armed_bandit_variable_plays

Controlling a game using mediapipe hand tracking

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

Video Frame Interpolation with Transformer (CVPR2022)

Dynamic Environments with Deformable Objects (DEDO)

StarGAN-ZSVC: Unofficial PyTorch Implementation

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

A toolkit for making real world machine learning and data analysis applications in C++

Implements MLP-Mixer: An all-MLP Architecture for Vision.

Person Re-identification

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM，xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... （pytorch, tf2.0）

Sequential model-based optimization with a `scipy.optimize` interface

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

NLP made easy