This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Last update: Dec 13, 2022

Related tags

Overview

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2, coming soon.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	1K model
ResT-Lite	224x224	77.2	93.7	10.5M	1.4G	1246	baidu
ResT-Small	224x224	79.6	94.9	13.7M	1.9G	1043	baidu
ResT-Base	224x224	81.6	95.7	30.3M	4.3G	673	baidu
ResT-Large	224x224	83.6	96.3	51.6M	7.9G	429	baidu

Note: access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v2},
  year={2021}
}

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Related tags

Overview

ResT

Main Results on ImageNet with Pretrained Models

Citing ResT

Owner

zhql

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

tsflex - feature-extraction benchmarking

Generative Models as a Data Source for Multiview Representation Learning

OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Hand Gesture Volume Control | Open CV | Computer Vision

N-Omniglot is a large neuromorphic few-shot learning dataset

nfelo: a power ranking, prediction, and betting model for the NFL

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Prefix-Tuning: Optimizing Continuous Prompts for Generation

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

NeRF visualization library under construction

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485

Paper list of log-based anomaly detection

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"