Dynamic Token Normalization Improves Vision Transformers

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

Dynamic Token Normalization Improves Vision Transformers

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Vision Transformer and MLP-Mixer Architectures

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Dynamic View Synthesis from Dynamic Monocular Video

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Wordle-solver - Wordle answer generation program in python

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

Python Single Object Tracking Evaluation

This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

Vector Quantization, in Pytorch

SemiNAS: Semi-Supervised Neural Architecture Search

Open Source Differentiable Computer Vision Library for PyTorch

This library provides an abstraction to perform Model Versioning using Weight & Biases.

A framework for GPU based high-performance medical image processing and visualization

Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

Fully-automated scripts for collecting AI-related papers

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.