The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Last update: Nov 14, 2022

Related tags

Deep Learning weak-sup-visual-grounding

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

This repository is the official implementation of CVPR 2021 paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Requirements

Tensorflow-1-15

Training

To train the NCE model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5

To train the NCE+Distill model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5 \
  --phrase_to_label_json=phrase_to_label.json

Evaluation

To evaluate the model on Flickr30K, run:

python eval_model.py \
  --region_feat_path=region_features_test.hdf5 \
  --phrase_feat_path=phrase_features_test.hdf5 \
  --glove_path=glove.hdf5 \
  --restore_path=checkpoint.meta

Pre-trained Models

You can download pretrained models using Res101 VG features here:

You can also find the features on Flickr30K test split here.

The pretrained models achieve the following performance on Flickr30K test split:

Model Name	[email protected]	[email protected]	[email protected]
NCE+Distill	0.5310	0.7394	0.7875
NCE	0.5135	0.7338	0.7833

Citation

If you use our implementation in your research or wish to refer to the results published in our paper, please use the following BibTeX entry.

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Liwei and Huang, Jing and Li, Yin and Xu, Kun and Yang, Zhengyuan and Yu, Dong},
    title     = {Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14090-14100}
}

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Related tags

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Requirements

Training

Evaluation

Pre-trained Models

Citation

Owner

Code for paper "Multi-level Disentanglement Graph Neural Network"

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Gans-in-action - Companion repository to GANs in Action: Deep learning with Generative Adversarial Networks

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Training BERT with Compute/Time (Academic) Budget

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Fuzzy Overclustering (FOC)

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

A curated list of resources for Image and Video Deblurring

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Really awesome semantic segmentation

Simulate genealogical trees and genomic sequence data using population genetic models

Learning trajectory representations using self-supervision and programmatic supervision.

This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.