Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

Related tags

Deep LearningGD-VCR
Overview

GD-VCR

Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021).

Research Questions and Aims:

  1. How well can a model perform on the images which requires geo-diverse commonsense to understand?
  2. What are the reasons behind performance disparity on Western and non-Western images?
  3. We aim to broaden researchers' vision on a realistic issue existing all over the world, and call upon researchers to consider more inclusive commonsense knowledge and better model transferability on various cultures.

In this repo, GD-VCR dataset and codes about 1) general model evaluation, 2) detailed controlled experiments, and 3) dataset construction are provided.

Repo Structure

GD-VCR
 ├─X_VCR				  --> storing GD-VCR/VCR data
 ├─configs
 │  └─vcr
 │     └─fine-tune-qa.json		  --> part of configs for evaluation
 ├─dataloaders
 │  └─vcr.py			          --> load GD-VCR/VCR data based on configs
 ├─models
 │  └─train.py		                  --> fine-tune/evaluate models
 │
 ├─val.jsonl			          --> GD-VCR dataset
 ├─val_addition_single.jsonl		  --> additional low-order QA pairs

GD-VCR dataset

First download the original VCR dataset to X_VCR:

cd X_VCR
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1annots.zip
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1images.zip
unzip vcr1annots.zip
unzip vcr1images.zip

Then download the GD-VCR dataset to X_VCR:

cd X_VCR
mv val.jsonl orig_val.jsonl
wget https://gdvcr.s3.us-west-1.amazonaws.com/MC-VCR_sample.zip
unzip MC-VCR_sample.zip

cd ..
mv val.jsonl X_VCR/
mv val_addition_single.jsonl X_VCR/

The detailed items in our GD-VCR dataset are almost the same as VCR. Please refer to VCR website for detailed explanations.

VisualBERT

Prepare Environment

Prepare environment as mentioned in the original repo of VisualBERT.

Fine-tune model on original VCR

Download the task-specific pre-trained checkpoint on original VCR vcr_pre_train.th to GD-VCR/visualbert/trained_models.

Then, use the command to fine-tune:

export PYTHONPATH=$PYTHONPATH:GD-VCR/visualbert/
export PYTHONPATH=$PYTHONPATH:GD-VCR/

cd GD-VCR/visualbert/models

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/fine-tune-qa.json

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/eval.json \
        [-region REGION] \
        [-scene SCENE] \
        [-single_or_multiple SINGLE_OR_MULTIPLE] \
        [-orig_or_new ORIG_OR_NEW] \
	[-addition_annotation_analysis] \
        [-grounding]

Here are the explanations of several important attributions:

  • REGION: One of the regions west, east-asia, south-asia, africa.
  • SCENE: One of the scenario (e.g., wedding).
  • SINGLE_OR_MULTIPLE: Whether studying single(low-order) or multiple(high-order) cognitive questions.
  • addition_annotation_analysis: Whether studying GD-VCR or additional annotated questions. If yes, you can choose to set SINGLE_OR_MULTIPLE to specify which types of questions you want to investigate.
  • ORIG_OR_NEW: Whether studying GD-VCR or original VCR dev set.
  • grounding: Whether analyzing grounding results by visualizing attention weights.

Given our fine-tuned VisualBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
VisualBERT 53.27 **62.91** 52.04 45.39 51.85

ViLBERT

Prepare Environment

Prepare environment as mentioned in the original repo of ViLBERT.

Extract image features

We make use of the docker made for LXMERT. Detailed commands are shown below:

cd GD-VCR
git clone https://github.com/jiasenlu/bottom-up-attention.git
mv generate_tsv.py bottom-up-attention/tools
mv generate_tsv_gt.py bottom-up-attention/tools

docker pull airsplay/bottom-up-attention
docker run --name gd_vcr --runtime=nvidia -it -v /PATH/TO/:/PATH/TO/ airsplay/bottom-up-attention /bin/bash
[Used to enter into the docker]

cd /PATH/TO/GD-VCR/bottom-up-attention
pip install json_lines
pip install jsonlines
pip install python-dateutil==2.5.0

python ./tools/generate_tsv.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --out ../vilbert_beta/feature/VCR/VCR_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR
python ./tools/generate_tsv_gt.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test_gt.prototxt --out ../vilbert_beta/feature/VCR/VCR_gt_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR_gt
[Used to extract features]

Then, exit the dockerfile, and convert extracted features into lmdb form:

cd GD-VCR/vilbert_beta
python script/convert_lmdb_VCR.py
python script/convert_lmdb_VCR_gt.py

Fine-tune model on original VCR

Download the pre-trained checkpoint to GD-VCR/vilbert_beta/save/bert_base_6_layer_6_connect_freeze_0/.

Then, use the command to fine-tune:

cd GD-VCR/vilbert_beta
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0,1 python eval_tasks.py 
		--bert_model bert-base-uncased 
		--from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/vilbert_best.bin 
		--config_file config/bert_base_6layer_6conect.json --task 1 --split val  --batch_size 16

Note that if you want the results on original VCR dev set, you could directly change the "val_annotations_jsonpath" value of TASK1 to X_VCR/orig_val.jsonl.

Given our fine-tuned ViLBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
ViLBERT 58.47 **65.82** 62.90 46.45 62.04

Dataset Construction

Here we provide dataset construction methods in our paper:

  • similarity.py: Compute the similarity among answer candidates and distribute candidates to each annotated questions.
  • relevance_model.py: Train a model to compute the relevance between question and answer.
  • question_cluster.py: Infer question templates from original VCR dataset as the basis of annotation.

For sake of convenience, we provide the trained relevance computation model [Link].

Acknowledgement

We thank for VisualBERT, ViLBERT, and Detectron authors' implementation. Also, we appreciate the effort of original VCR paper's author, and our work is highly influenced by VCR.

Citation

Please cite our EMNLP paper if this repository inspired your work.

@inproceedings{yin2021broaden,
  title = {Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
  author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
  booktitle = {EMNLP},
  year = {2021}
}
Owner
Da Yin
Da Yin
Tensorflow AffordanceNet and AffContext implementations

AffordanceNet and AffContext This is tensorflow AffordanceNet and AffContext implementations. Both are implemented and tested with tensorflow 2.3. The

Beatriz Pérez 6 Dec 01, 2022
【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

SANet Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 to

36 Jan 05, 2023
Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022
Christmas face app for Decathlon xmas coding party!

Christmas Face Application Use this library to create the perfect picture for your christmas cards! Done by Hasib Zunair, Guillaume Brassard and Samue

Hasib Zunair 4 Dec 20, 2021
A geometric deep learning pipeline for predicting protein interface contacts.

A geometric deep learning pipeline for predicting protein interface contacts.

44 Dec 30, 2022
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

Zhilu Zhang 53 Dec 20, 2022
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022
Time-Optimal Planning for Quadrotor Waypoint Flight

Time-Optimal Planning for Quadrotor Waypoint Flight This is an example implementation of the paper "Time-Optimal Planning for Quadrotor Waypoint Fligh

Robotics and Perception Group 38 Dec 02, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
Learning Neural Painters Fast! using PyTorch and Fast.ai

The Joy of Neural Painting Learning Neural Painters Fast! using PyTorch and Fast.ai Blogpost with more details: The Joy of Neural Painting The impleme

Libre AI 72 Nov 10, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
Mercury: easily convert Python notebook to web app and share with others

Mercury Share your Python notebooks with others Easily convert your Python notebooks into interactive web apps by adding parameters in YAML. Simply ad

MLJAR 2.2k Dec 27, 2022
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

Tim 71 Jan 02, 2023
Official repository for ABC-GAN

ABC-GAN The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversa

IgorSusmelj 10 Jun 23, 2022
[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Self-paced Contrastive Learning (SpCL) The official repository for Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

Yixiao Ge 286 Dec 21, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 05, 2023
GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

Alibaba Cloud 5 Nov 14, 2022