Fast convergence of detr with spatially modulated co-attention

Overview

Fast convergence of detr with spatially modulated co-attention

Usage

There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/facebookresearch/detr.git

Then, install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

(optional) to work with panoptic install panopticapi:

pip install git+https://github.com/cocodataset/panopticapi.git

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

To train Single Scale SMCA on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 2 --lr_drop 40 --num_queries 300 --epochs 50 --dynamic_scale type3 --output_dir smca_single_scale


A single epoch takes 30 minutes, so 50 epoch training takes around 25 hours on a single machine with 8 V100 cards.

Object Detection

Model Zoo

name dataset backbone schedule box AP
0 SMCA(single scale) MSCOCO R50 50 41.0
1 SMCA-Container(single scale) MSCOCO Container-S-Light 50 44.2
2 SMCA-Container(single scale) MSCOCO Container-M 50 47.3
3 SMCA(single scale) MSCOCO R50 108 42.7
4 SMCA(single scale) MSCOCO R50 250 43.5
5 SMCA(multi scale) MSCOCO R50 50 43.7
6 SMCA(New multi scale) MSCOCO R50 50 44.4
7 SMCA Visual Genome R50 50 coming soon

Panoptic Segmentation

Model Zoo

name dataset backbone schedule PQ SQ RQ
1 MASK-Former(single scale) MSCOCO R50 500 46.5 80.4 56.8
2 SMCA-MASK-Former(single scale) MSCOCO R50 50 46.0 80.4 56.0
## Original SMCA code submission during ICCV review period. https://github.com/abc403/SMCA-replication

Release Steps

  1. Single-scale SMCA
  2. Single-scale SMCA with Container-Small
  3. Single-scale SMCA with Container-Medium
  4. New Multi-scale SMCA (Newly added Multi_scale_SMCA.zip, 9th Sep)
  5. SMCA-DETR for Fast Convergence of Panoptic Segmentation

Citation

If you find this repository useful, please consider citing our work:

@article{gao2021fast,
  title={Fast convergence of detr with spatially modulated co-attention},
  author={Gao, Peng and Zheng, Minghang and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2101.07448},
  year={2021}
}
@article{gao2021container,
  title={Container: Context Aggregation Network},
  author={Gao, Peng and Lu, Jiasen and Li, Hongsheng and Mottaghi, Roozbeh and Kembhavi, Aniruddha},
  journal={arXiv preprint arXiv:2106.01401},
  year={2021}
}
@article{zheng2020end,
  title={End-to-end object detection with adaptive clustering transformer},
  author={Zheng, Minghang and Gao, Peng and Wang, Xiaogang and Li, Hongsheng and Dong, Hao},
  journal={arXiv preprint arXiv:2011.09315},
  year={2020}
}

Contributor

Peng Gao, Qiu Han, Minghang Zeng

Acknowledege

The project are borrowed heavily from DETR. Partially motivated by Sparse RCNN.

Owner
peng gao
Young Scientist at Shanghai AI Lab
peng gao
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 02, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
SuRE Evaluation: A Supplementary Material

SuRE Evaluation: A Supplementary Material This repository contains supplementary material regarding the evaluations presented in the paper Visual Expl

NYU Visualization Lab 0 Dec 14, 2021
Fast and robust clustering of point clouds generated with a Velodyne sensor.

Depth Clustering This is a fast and robust algorithm to segment point clouds taken with Velodyne sensor into objects. It works with all available Velo

Photogrammetry & Robotics Bonn 957 Dec 21, 2022
Employs neural networks to classify images into four categories: ship, automobile, dog or frog

Neural Net Image Classifier Employs neural networks to classify images into four categories: ship, automobile, dog or frog Viterbi_1.py uses a classic

Riley Baker 1 Jan 18, 2022
It is a system used to detect bone fractures. using techniques deep learning and image processing

MohammedHussiengadalla-Intelligent-Classification-System-for-Bone-Fractures It is a system used to detect bone fractures. using techniques deep learni

Mohammed Hussien 7 Nov 11, 2022
Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Elias Kassapis 31 Nov 22, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022
Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available f

Yongrui Chen 5 Nov 10, 2022
Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning"

CAPGNN Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning" Paper URL: https://ar

1 Mar 12, 2022
Multispectral Object Detection with Yolov5

Multispectral-Object-Detection Intro Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection. Multispectral Object Dete

Richard Fang 121 Jan 01, 2023
OpenCVのGrabCut()を利用したセマンティックセグメンテーション向けアノテーションツール(Annotation tool using GrabCut() of OpenCV. It can be used to create datasets for semantic segmentation.)

[Japanese/English] GrabCut-Annotation-Tool GrabCut-Annotation-Tool.mp4 OpenCVのGrabCut()を利用したアノテーションツールです。 セマンティックセグメンテーション向けのデータセット作成にご使用いただけます。 ※Grab

KazuhitoTakahashi 30 Nov 18, 2022
Semi-supervised learning for object detection

Source code for STAC: A Simple Semi-Supervised Learning Framework for Object Detection STAC is a simple yet effective SSL framework for visual object

Google Research 348 Dec 25, 2022
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
Implementation of gaze tracking and demo

Predicting Customer Demand by Using Gaze Detecting and Object Tracking This project is the integration of gaze detecting and object tracking. Predict

2 Oct 20, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
PyTorch implementation for View-Guided Point Cloud Completion

PyTorch implementation for View-Guided Point Cloud Completion

22 Jan 04, 2023
AdelaiDepth is an open source toolbox for monocular depth prediction.

AdelaiDepth is an open source toolbox for monocular depth prediction.

Adelaide Intelligent Machines (AIM) Group 743 Jan 01, 2023
code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

HCV_IIRC code for our BMVC 2021 paper HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification by Kai Wang, Xialei Li

kai wang 13 Oct 03, 2022