VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Related tags

Deep LearningVL-LTR
Overview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14

Data preparation

ImageNet-LT

Download and extract ImageNet train and val images from here. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val/ folder respectively.

Then download and extract the wiki text into the same directory, and the directory tree of data is expected to be like this:

./data/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  ImageNet_LT_test.txt
  ImageNet_LT_train.txt
  ImageNet_LT_val.txt
  labels.txt

After that, download the CLIP's pretrained weight RN50.pt and ViT-B-16.pt into the pretrained directory from https://github.com/openai/CLIP.

Places-LT

Download the places365_standard data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this (almost the same as ImageNet-LT):

./data/places/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  wiki/
  	desc_1.txt
  Places_LT_test.txt
  Places_LT_train.txt
  Places_LT_val.txt
  labels.txt

iNaturalist 2018

Download the iNaturalist 2018 data from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/iNat/
  train_val2018/
  wiki/
  	desc_1.txt
  categories.json
  test2018.json
  train2018.json
  val.json

Evaluation

To evaluate VL-LTR with a single GPU run:

  • Pre-training stage
bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain
  • Fine-tuning stage:
bash eval.sh ${CONFIG_PATH} 1

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VL-LTR on a single node with 8 GPUs for:

  • Pre-training stage, run:
bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8
  • Fine-tuning stage:

    • First, calculate the $\mathcal L_{\text{lin}}$ of each sentence for AnSS method by running this:
    bash eval.sh ${CONFIG_PATH} 1 --eval-pretrain --select
    • then, running this:
    bash dist_train_arun.sh ${PARTITION} ${CONFIG_PATH} 8

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Results

Below list our model's performance on ImageNet-LT, Places-LT, and iNaturalist 2018.

Dataset Backbone Top-1 Accuracy Download
ImageNet-LT ResNet-50 70.1 Weights
ImageNet-LT ViT-Base-16 77.2 Weights
Places-LT ResNet-50 48.0 Weights
Places-LT ViT-Base-16 50.1 Weights
iNaturalist 2018 ResNet-50 74.6 Weights
iNaturalist 2018 ViT-Base-16 76.8 Weights

For more detailed information, please refer to our paper directly.

Citation

If you are interested in our work, please cite as follows:

@article{tian2021vl,
  title={VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition},
  author={Tian, Changyao and Wang, Wenhai and Zhu, Xizhou and Wang, Xiaogang and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2111.13579},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

You might also like...
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Implementation of
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Official codes for the paper
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

[ICCV2021] Official code for
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

Comments
  • Problem about running eval.sh

    Problem about running eval.sh

    """ #!/usr/bin/env bash set -x

    export NCCL_LL_THRESHOLD=0

    CONFIG=$1 GPUS=$1 CPUS=$[GPUS*2] PORT=${PORT:-8886}

    CONFIG_NAME=${CONFIG##/} CONFIG_NAME=${CONFIG_NAME%.}

    OUTPUT_DIR="./checkpoints/eval" if [ ! -d $OUTPUT_DIR ]; then mkdir ${OUTPUT_DIR} fi

    python -u main.py
    --port=$PORT
    --num_workers 4
    --resume "./checkpoints/${CONFIG_NAME}/checkpoint.pth"
    --output-dir ${OUTPUT_DIR}
    --config $CONFIG ${@:3}
    --eval
    2>&1 | tee -a ${OUTPUT_DIR}/train.log """ I have two A100, so set GPUS is 2. All other settings according to ReadME.md but I got a problem when running eval.sh """ File "eval.sh", line 4 export NCCL_LL_THRESHOLD=0 ^ SyntaxError: invalid syntax

    """

    opened by euminds 2
  • Mismatch between code and diagram in paper for the fine-tuning phase

    Mismatch between code and diagram in paper for the fine-tuning phase

    In fig 3, stage 2 from the paper, it looks like value for the attention is calculated based on Vision and language (Q is vision, K is language) and then applied to the language (V). But in the code, the attention is applied to the visual features. Can you verify which one is the correct way? @ChangyaoTian

    opened by rahulvigneswaran 0
  • pre-trained weights with TorchScript?

    pre-trained weights with TorchScript?

    Hello, Thanks for the great work! May I ask if it's possible for you to also provide the checkpoint weight in a TorchScript version?

    It's something like:

    import torch
    import torchvision.models as models
    
    model = models.resnet50()
    traced = torch.jit.trace(model, (torch.rand(4, 3, 224, 224),))
    torch.jit.save(traced, "test.pt")
    
    # load model
    model = torch.jit.load("test.pt")
    
    opened by xinleihe 0
Releases(ECCV-2022-video)
[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

When Does Self-Supervision Help Graph Convolutional Networks? PyTorch implementation for When Does Self-Supervision Help Graph Convolutional Networks?

Shen Lab at Texas A&M University 106 Nov 11, 2022
A strongly-typed genetic programming framework for Python

monkeys "If an army of monkeys were strumming on typewriters they might write all the books in the British Museum." monkeys is a framework designed to

H. Chase Stevens 115 Nov 27, 2022
Hyperparameter Optimization for TensorFlow, Keras and PyTorch

Hyperparameter Optimization for Keras Talos • Key Features • Examples • Install • Support • Docs • Issues • License • Download Talos radically changes

Autonomio 1.6k Dec 15, 2022
Simple, efficient and flexible vision toolbox for mxnet framework.

MXbox: Simple, efficient and flexible vision toolbox for mxnet framework. MXbox is a toolbox aiming to provide a general and simple interface for visi

Ligeng Zhu 31 Oct 19, 2019
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022
Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN This is an unofficial implementation of SinGAN from someone who's been sitting right next to SinGAN's creator for almost five years. Please ref

35 Nov 10, 2022
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

FermiFlow 9 Mar 03, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
List some popular DeepFake models e.g. DeepFake, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, SimSwap, CihaNet, etc.

deepfake-models List some popular DeepFake models e.g. DeepFake, CihaNet, SimSwap, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, Si

Mingcan Xiang 100 Dec 17, 2022
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Introduction PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch. Key features include: Data structure for

Facebook Research 6.8k Jan 01, 2023
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022
Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis.

ID-Unet: Iterative-view-synthesis(CVPR2021 Oral) Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis. Overvie

17 Aug 23, 2022
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

MLCR This is the source code for paper Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition. Xuesong Niu, Hu Han, Shiguang

Edson-Niu 60 Nov 29, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
High-resolution networks and Segmentation Transformer for Semantic Segmentation

High-resolution networks and Segmentation Transformer for Semantic Segmentation Branches This is the implementation for HRNet + OCR. The PyTroch 1.1 v

HRNet 2.8k Jan 07, 2023
An end-to-end PyTorch framework for image and video classification

What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R

Facebook Research 1.5k Dec 31, 2022