Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Overview

WSDEC

This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos.

Description

Repo directories

  • ./: global config files, training, evaluating scripts;
  • ./data: data dictionary;
  • ./model: our final models used to reproduce the results;
  • ./runs: the default output dictionary used to store our trained model and result files;
  • ./scripts: helper scripts;
  • ./third_party: third party dependency include the official evaluation scripts;
  • ./utils: helper functions;
  • ./train_script: all training scripts;
  • ./eval_script: all evalulating scripts.

Dependency

  • Python 2.7
  • CUDA 9.0(note: you will encounter a bug saying segmentation fault(core dump) if you run our code with CUDA 8.0)
    • But it seems that the bug still exists. See issue
  • [Pytorch 0.3.1](note: 0.3.1 is not compatible with newer version)
  • numpy, hdf5 and other necessary packages(no special requirement)

Usage for reproduction

Before we start

Before the training and testing, we should make sure the data, third party data are prepared, here is the one-by-one steps to make everything prepared.

1. Clone our repo and submodules

git clone --recursive https://github.com/XgDuan/WSDEC

2. Download all the data

  • Download the official C3D features, you can either download the data from the website or from our onedrive cloud.

    • Download from the official website; (Note, after you download the C3D features, you can either place it in the data folder and rename it as anet_v1.3.c3d.hdf5, or create a soft link in the data dictionary as ln -s YOURC3DFeature data/anet_v1.3.c3d.hdf5)
  • Download the dense video captioning data from the official website; (Similar to the C3D feature, you are supposed to place the download data in the data folder and rename it as densecap)

  • Download the data for the official evaluation scripts densevid_eval;

    • run the command sh download.sh scripts in the folder PREFIX/WSDEC/third_party/densevid_eval;
  • [Good News]: we write a shell script for you to download the data, just run the following command:

    cd data
    sh download.sh
    

3. Generate the dictionary for the caption model

python scripts/caption_preprocess.py

Training

There are two steps for model training: pretrain a not so bad caption model; and the second step, train the final/baseline model.

Our pretrained captioning model is trained.

python train_script/train_cg_pretrain.py

train our final model

python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME

train baselines

  1. train the baseline model without classification loss.
python train_script/train_baseline_regressor.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME
  1. train the baseline model without regression branch.
python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --regressor_scale 0 --alias MODEL_NAME

About the arguments

All the arguments we use can be found in the corresponding training scripts. You can also use your own argumnets if you like to do so. But please mind, some arguments are discarded(This is our own reimplementation of our paper, the first version codes are too dirty that no one would like to use it.)

Testing

Testing is easier than training. Firstly, in the process of training, our scripts will call the densevid_eval in a subprocess every time after we run the eval function. From these results, you can have a general grasp about the final performance by just have a look at the eval_results.txt scripts. Secondly, after some epochs, you can run the evaluation scripts:

  1. evaluate the full model or no_regression model:
python eval_script/evaluate.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the no_classification model:
python eval_script/evaluate_baseline_regressor.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the pretrained model with random temporal segment:
python eval_script/evaluate_pretrain.py --checkpoint YOUR_PRETRAIN_CAPTION_MODEL.ckp

Other usages

Besides reproduce our work, there are at least two interesting things you can do with our codes.

Train a supervised sentence localization model

To know what is sentence localization, you can have a look at our paper ABLR. Note that our work at a matter of fact provides an unsupervised solution towards sentence localization, we introduce the usage for the supervised model here. We have written the trainer, you can just run the following command and have a cup of coffee:

python train_script/train_sl.py

Train a supervised video event caption generation model

If you have read our paper, you would find that event captioning is the dual task of the aforementioned sentence localization task. To train such a model, just run the following command:

python train_script/train_cg.py

BUGS

You may encounter a cuda internal bug that says Segmentation fault(core dumped) during training if you are using cuda 8.0. If such things happen, try upgrading your cuda to 9.0.

other

We will add more description about how to use our code. Please feel free to contact us if you have any questions or suggestions.

Trained model and results

Links for our trained model

You can download our pretrained model for evaluation or further usage from our onedrive, which includes a pretrained caption generator(cg_pretrain.ckp), a baseline model without classification loss(baseline_noclass.ckp), a baseline model without regression branch(baseline_noregress.ckp), and our final model(final_model.ckp).

Cite the paper and give us star ⭐️

If you find our paper or code useful, please cite our paper using the following bibtex:

@incollection{NIPS2018_7569,
title = {Weakly Supervised Dense Event Captioning in Videos},
author = {Duan, Xuguang and Huang, Wenbing and Gan, Chuang and Wang, Jingdong and Zhu, Wenwu and Huang, Junzhou},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {3062--3072},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7569-weakly-supervised-dense-event-captioning-in-videos.pdf}
}
Owner
Melon(Xuguang Duan)
Lick the screen
Melon(Xuguang Duan)
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 01, 2023
This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

AlphaRotate: A Rotation Detection Benchmark using TensorFlow Abstract AlphaRotate is maintained by Xue Yang with Shanghai Jiao Tong University supervi

yangxue 972 Jan 05, 2023
Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Text2Music Emotion Embedding Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings Reference Emotion Embedding Spaces for Matching

Minz Won 50 Dec 05, 2022
Self-Guided Contrastive Learning for BERT Sentence Representations

Self-Guided Contrastive Learning for BERT Sentence Representations This repository is dedicated for releasing the implementation of the models utilize

Taeuk Kim 16 Dec 04, 2022
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

Zhifeng Kong 68 Dec 26, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

Xinchen Yan 36 Oct 02, 2022
disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

Google Research 1.3k Dec 28, 2022
HDMapNet: A Local Semantic Map Learning and Evaluation Framework

HDMapNet_devkit Devkit for HDMapNet. HDMapNet: A Local Semantic Map Learning and Evaluation Framework Qi Li, Yue Wang, Yilun Wang, Hang Zhao [Paper] [

Tsinghua MARS Lab 421 Jan 04, 2023
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

Em 6 Sep 17, 2022
Python library to receive live stream events like comments and gifts in realtime from TikTok LIVE.

TikTokLive A python library to connect to and read events from TikTok's LIVE service A python library to receive and decode livestream events such as

Isaac Kogan 277 Dec 23, 2022
Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

VIMuRe Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data. If you use this code please cite this article (preprint). De

6 Dec 15, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

COTREC Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'. Requirements: Python 3.7, Pytorch 1.6.0 Best Hype

Xin Xia 42 Dec 09, 2022
Asymmetric metric learning for knowledge transfer

Asymmetric metric learning This is the official code that enables the reproduction of the results from our paper: Asymmetric metric learning for knowl

20 Dec 06, 2022
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the

Fadi Boutros 33 Dec 31, 2022
scalingscattering

Scaling The Scattering Transform : Deep Hybrid Networks This repository contains the experiments found in the paper: https://arxiv.org/abs/1703.08961

Edouard Oyallon 78 Dec 21, 2022
A mini lib that implements several useful functions binding to PyTorch in C++.

Torch-gather A mini library that implements several useful functions binding to PyTorch in C++. What does gather do? Why do we need it? When dealing w

maxwellzh 8 Sep 07, 2022