This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Last update: Sep 24, 2022

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

This is an official implementation of:

Shizhe Chen and Dong Huang, Elaborative Rehearsal for Zero-shot Action Recognition, ICCV, 2021. Arxiv Version

Elaborating a new concept and relating it to known concepts, we reach the dawn of zero-shot action recognition models being comparable to supervised models trained on few samples.

New SOTA results are also achieved on the standard ZSAR benchmarks (Olympics, HMDB51, UCF101) as well as the first large scale ZSAR benchmak (we proposed) on the Kinetics database.

Installation

git clone https://github.com/DeLightCMU/ElaborativeRehearsal.git
cd ElaborativeRehearsal
export PYTHONPATH=$(pwd):${PYTHONPATH}

pip install -r requirements.txt

# download pretrained models
bash scripts/download_premodels.sh

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

spatial-temporal features

bash scripts/extract_tsm_features.sh '0,1,2'

object features

bash scripts/extract_object_features.sh '0,1,2'

ZSAR Training and Inference

Baselines: DEVISE, ALE, SJE, DEM, ESZSL and GCN.

# mtype: devise, ale, sje, dem, eszsl
mtype=devise
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --eval_set tst
# evaluate other splits
ksplit=1
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines_eval_splits.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} ${ksplit}

# gcn
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --eval_set tst

ER-ZSAR and ablations:

# TSM + ED class representation + AttnPool (2nd row in Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_wordembed_config.yaml --is_train --resume_file datasets/Kinetics/zsl220/word.glove42b.th

# TSM + ED class representation + BERT (last row in Table 4(a) and Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_config.yaml --is_train

# Obj + ED class representation + BERT + ER Loss (last row in Table 4(c))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_cptembed.py zeroshot/configs/zsl_cpt_config.yaml --is_train

# ER-ZSAR Full Model
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_ervse.py zeroshot/configs/zsl_ervse_config.yaml --is_train

Citation

If you find this repository useful, please cite our paper:

@proceeding{ChenHuang2021ER,
  title={Elaborative Rehearsal for Zero-shot Action Recognition},
  author={Shizhe Chen and Dong Huang},
  booktitle = {ICCV},
  year={2021}
}

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

Installation

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

ZSAR Training and Inference

Citation

Acknowledgement

Owner

DeLightCMU

SberSwap Video Swap base on deep learning

[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

TransMorph: Transformer for Medical Image Registration

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

Adversarially Learned Inference

Best practices for segmentation of the corporate network of any company

ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

Differentiable Abundance Matching With Python

Multispectral Object Detection with Yolov5

👨‍💻 run nanosaur in simulation with Gazebo/Ingnition

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

iBOT: Image BERT Pre-Training with Online Tokenizer

High level network definitions with pre-trained weights in TensorFlow

Learning Neural Network Subspaces

Adversarial Learning for Modeling Human Motion

Intelligent Video Analytics toolkit based on different inference backends.

Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"