Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

Last update: Dec 01, 2022

Related tags

Deep Learning CMPC-Refseg

Overview

CMPC-Refseg

Code of our CVPR 2020 paper Referring Image Segmentation via Cross-Modal Progressive Comprehension.

Shaofei Huang*, Tianrui Hui*, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li (* Equal contribution)

Interpretation of CMPC.

(a) Input referring expression and image.
(b) The model first perceives all the entities described in the expression based on entity words and attribute words, e.g., “man” and “white frisbee” (orange masks and blue outline).
(c) After finding out all the candidate entities that may match with input expression, relational word “holding” can be further exploited to highlight the entity involved with the relationship (green arrow) and suppress the others which are not involved.
(d) Benefiting from the relation-aware reasoning process, the referred entity is found as the final prediction (purple mask).

Experimental Results

We modify the way of feature concatenation in the end of CMPC module and achieve higher performances than the results reported in our paper. New experimental results are summarized in the table bellow. You can download our trained checkpoints to test on the four datasets. The link to the checkpoints is: Baidu Drive, pswd: jjsf.

Method	UNC val	UNC testA	UNC testB	UNC+ val	UNC+ testA	UNC+ testB	G-Ref val	ReferIt test
STEP-ICCV19 [1]	60.04	63.46	57.97	48.19	52.33	40.41	46.40	64.13
Ours-CVPR20	61.36	64.53	59.64	49.56	53.44	43.23	49.05	65.53
Ours-Updated	62.47	65.08	60.82	50.25	54.04	43.47	49.89	65.58

Setup

We recommended the following dependencies.

Python 2.7
TensorFlow 1.5
Numpy
pydensecrf

This code is derived from RRN [2]. Please refer to it for more details of setup.

Data Preparation

Dataset Preprocessing

We conduct experiments on 4 datasets of referring image segmentation, including UNC, UNC+, Gref and ReferIt. After downloading these datasets, you can run the following commands for data preparation:

python build_batches.py -d Gref -t train
python build_batches.py -d Gref -t val
python build_batches.py -d unc -t train
python build_batches.py -d unc -t val
python build_batches.py -d unc -t testA
python build_batches.py -d unc -t testB
python build_batches.py -d unc+ -t train
python build_batches.py -d unc+ -t val
python build_batches.py -d unc+ -t testA
python build_batches.py -d unc+ -t testB
python build_batches.py -d referit -t trainval
python build_batches.py -d referit -t test

Glove Embedding

Download Gref_emb.npy and referit_emb.npy and put them in data/. We provide download link for Glove Embedding here: Baidu Drive, password: 2m28.

Training

Train on UNC training set with:

python -u trainval_model.py -m train -d unc -t train -n CMPC_model -emb -f ckpts/unc/cmpc_model

Testing

Test on UNC validation set with:

python -u trainval_model.py -m test -d unc -t val -n CMPC_model -i 700000 -c -emb -f ckpts/unc/cmpc_model

CMPC for video referring segmentation

We release video version code for CMPC on A2D dataset under CMPC_video/.

Reference

[1] Chen, Ding-Jie, et al. "See-through-text grouping for referring image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2019.

[2] Li, Ruiyu, et al. "Referring image segmentation via recurrent refinement networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Citation

If our CMPC is useful to your research, please consider citing:

@inproceedings{huang2020referring,
  title={Referring Image Segmentation via Cross-Modal Progressive Comprehension},
  author={Huang, Shaofei and Hui, Tianrui and Liu, Si and Li, Guanbin and Wei, Yunchao and Han, Jizhong and Liu, Luoqi and Li, Bo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10488--10497},
  year={2020}
}

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

Related tags

Overview

CMPC-Refseg

Interpretation of CMPC.

Experimental Results

Setup

Data Preparation

Training

Testing

CMPC for video referring segmentation

Reference

Citation

Owner

spyflying

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

A small tool to joint picture including gif

Styled Handwritten Text Generation with Transformers (ICCV 21)

Repository for the paper "Exploring the Sensory Spaces of English Perceptual Verbs in Natural Language Data"

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

Implementation of a Transformer using ReLA (Rectified Linear Attention)

Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

The first public PyTorch implementation of Attentive Recurrent Comparators

PyTorch implementation of Convolutional Neural Fabrics http://arxiv.org/abs/1606.02492

Dense Gaussian Processes for Few-Shot Segmentation

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

REBEL: Relation Extraction By End-to-end Language generation

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

auto-tuning momentum SGD optimizer