Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Overview

Focal Transformer

PWC PWC PWC PWC PWC PWC

This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transformers", by Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan and Jianfeng Gao.

Introduction

focal-transformer-teaser

Our Focal Transfomer introduced a new self-attention mechanism called focal self-attention for vision transformers. In this new mechanism, each token attends the closest surrounding tokens at fine granularity but the tokens far away at coarse granularity, and thus can capture both short- and long-range visual dependencies efficiently and effectively.

With our Focal Transformers, we achieved superior performance over the state-of-the-art vision Transformers on a range of public benchmarks. In particular, our Focal Transformer models with a moderate size of 51.1M and a larger size of 89.8M achieve 83.6 and 84.0 Top-1 accuracy, respectively, on ImageNet classification at 224x224 resolution. Using Focal Transformers as the backbones, we obtain consistent and substantial improvements over the current state-of-the-art methods for 6 different object detection methods trained with standard 1x and 3x schedules. Our largest Focal Transformer yields 58.7/58.9 box mAPs and 50.9/51.3 mask mAPs on COCO mini-val/test-dev, and 55.4 mIoU on ADE20K for semantic segmentation.

Benchmarking

Image Classification on ImageNet-1K

Model Pretrain Use Conv Resolution [email protected] [email protected] #params FLOPs Checkpoint Config
Focal-T IN-1K No 224 82.2 95.9 28.9M 4.9G download yaml
Focal-T IN-1K Yes 224 82.7 96.1 30.8M 4.9G download yaml
Focal-S IN-1K No 224 83.6 96.2 51.1M 9.4G download yaml
Focal-B IN-1K No 224 84.0 96.5 89.8M 16.4G download yaml

Object Detection and Instance Segmentation on COCO

Mask R-CNN

Backbone Pretrain Lr Schd #params FLOPs box mAP mask mAP
Focal-T ImageNet-1K 1x 49M 291G 44.8 41.0
Focal-T ImageNet-1K 3x 49M 291G 47.2 42.7
Focal-S ImageNet-1K 1x 71M 401G 47.4 42.8
Focal-S ImageNet-1K 3x 71M 401G 48.8 43.8
Focal-B ImageNet-1K 1x 110M 533G 47.8 43.2
Focal-B ImageNet-1K 3x 110M 533G 49.0 43.7

RetinaNet

Backbone Pretrain Lr Schd #params FLOPs box mAP
Focal-T ImageNet-1K 1x 39M 265G 43.7
Focal-T ImageNet-1K 3x 39M 265G 45.5
Focal-S ImageNet-1K 1x 62M 367G 45.6
Focal-S ImageNet-1K 3x 62M 367G 47.3
Focal-B ImageNet-1K 1x 101M 514G 46.3
Focal-B ImageNet-1K 3x 101M 514G 46.9

Other detection methods

Backbone Pretrain Method Lr Schd #params FLOPs box mAP
Focal-T ImageNet-1K Cascade Mask R-CNN 3x 87M 770G 51.5
Focal-T ImageNet-1K ATSS 3x 37M 239G 49.5
Focal-T ImageNet-1K RepPointsV2 3x 45M 491G 51.2
Focal-T ImageNet-1K Sparse R-CNN 3x 111M 196G 49.0

Semantic Segmentation on ADE20K

Backbone Pretrain Method Resolution Iters #params FLOPs mIoU mIoU (MS)
Focal-T ImageNet-1K UPerNet 512x512 160k 62M 998G 45.8 47.0
Focal-S ImageNet-1K UPerNet 512x512 160k 85M 1130G 48.0 50.0
Focal-B ImageNet-1K UPerNet 512x512 160k 126M 1354G 49.0 50.5
Focal-L ImageNet-22K UPerNet 640x640 160k 240M 3376G 54.0 55.4

Getting Started

Citation

If you find this repo useful to your project, please consider to cite it with following bib:

@misc{yang2021focal,
    title={Focal Self-attention for Local-Global Interactions in Vision Transformers}, 
    author={Jianwei Yang and Chunyuan Li and Pengchuan Zhang and Xiyang Dai and Bin Xiao and Lu Yuan and Jianfeng Gao},
    year={2021},
    eprint={2107.00641},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Our codebase is built based on Swin-Transformer. We thank the authors for the nicely organized code!

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Position detection system of mobile robot in the warehouse enviroment

Autonomous-Forklift-System About | GUI | Tests | Starting | License | Author | 🎯 About An application that run the autonomous forklift paletization a

Kamil GoÅ› 1 Nov 24, 2021
Multitask Learning Strengthens Adversarial Robustness

Multitask Learning Strengthens Adversarial Robustness

Columbia University 15 Jun 10, 2022
Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) is a new approach of noise reduction methods. In this repository is shown the package developed for this new method based on \citepaper.

Fully Adaptive Bayesian Algorithm for Data Analysis FABADA FABADA is a novel non-parametric noise reduction technique which arise from the point of vi

18 Oct 20, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

KML: A Machine Learning Framework for Operating Systems & Storage Systems Storage systems and their OS components are designed to accommodate a wide v

File systems and Storage Lab (FSL) 186 Nov 24, 2022
A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

Neural 3D Mesh Renderer in PadddlePaddle A PaddlePaddle version of Neural Renderer, refer to its PyTorch version Install Run: pip install neural-rende

AgentMaker 13 Jul 12, 2022
This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Pruning Self-attentions into Convolutional Layers in Single Path This is the official repository for our paper: Pruning Self-attentions into Convoluti

Zhuang AI Group 77 Dec 26, 2022
An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Fast Transformer This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer

Rishit Dagli 139 Dec 28, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 04, 2023
Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

Chirag Jain 17 May 16, 2022
Neural Articulated Radiance Field

Neural Articulated Radiance Field NARF Neural Articulated Radiance Field Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada ICCV 2021 [Paper] [Co

Atsuhiro Noguchi 144 Jan 03, 2023
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 04, 2023
Motion and Shape Capture from Sparse Markers

MoSh++ This repository contains the official chumpy implementation of mocap body solver used for AMASS: AMASS: Archive of Motion Capture as Surface Sh

Nima Ghorbani 135 Dec 23, 2022
PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

FInite volume Neural Network (FINN) This repository contains the PyTorch code for models, training, and testing, and Python code for data generation t

Cognitive Modeling 20 Dec 18, 2022
AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

Md. Rakibul Islam 1 Jan 13, 2022
This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graphormer By Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng*, Guolin Ke, Di He*, Yanming Shen and Tie-Yan Liu. This repo is the official impl

Microsoft 1.3k Dec 26, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Deep Exemplar-based Video Colorization (Pytorch Implementation) Paper | Pretrained Model | Youtube video 🔥 | Colab demo Deep Exemplar-based Video Col

Bo Zhang 253 Dec 27, 2022
Malware Analysis Neural Network project.

MalanaNeuralNetwork Description Malware Analysis Neural Network project. Table of Contents Getting Started Requirements Installation Clone Set-Up VENV

2 Nov 13, 2021
[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

dispersion-score Official implementation of 3DV 2021 Paper A Dataset-dispersion Perspective on Reconstruction versus Recognition in Single-view 3D Rec

Yefan 7 May 28, 2022
Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

Bytedance Inc. 958 Jan 03, 2023