Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Last update: Dec 08, 2022

Related tags

Overview

MLP-Mixer

Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy.

In this paper, the authors show a performance close to SotA in an image classification benchmark using MLP(Multi-layer perceptron) without using CNN and Transformer.

MLP-Mixer (Mixer for short) consists of per-patch linear embeddings, Mixer layers, and a classifier head. Mixer layers contain one token-mixing MLP and one channel-mixing MLP, each consisting of two fully-connected layers and a GELU nonlinearity. Other components include: skip-connections, dropout, and linear classifier head.

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

Available models: Mixer-B_16, Mixer-L_16
- imagenet pre-train models
  - Mixer-B_16, Mixer-L_16
- imagenet-21k pre-train models
  - Mixer-B_16, Mixer-L_16

# imagenet pre-train
wget https://storage.googleapis.com/mixer_models/imagenet1k/{MODEL_NAME}.npz

# imagenet-21k pre-train
wget https://storage.googleapis.com/mixer_models/imagenet21k/{MODEL_NAME}.npz

2. Fine-tuning

python3 train.py --name cifar10-100_500 --model_type Mixer-B_16 --pretrained_dir checkpoint/Mixer-B_16.npz

Reproducing Mixer results

upstream	model	dataset	acc(official)
ImageNet	Mixer-B/16	cifar10	96.72
ImageNet	Mixer-L/16	cifar10	96.59
ImageNet-21k	Mixer-B/16	cifar10	96.82
ImageNet-21k	Mixer-L/16	cifar10	96.34

Reference

Google's Vision Transformer and MLP-Mixer

Citations

@article{tolstikhin2021,
  title={MLP-Mixer: An all-MLP Architecture for Vision},
  author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
  journal={arXiv preprint arXiv:2105.01601},
  year={2021}
}

Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Related tags

Overview

MLP-Mixer

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

2. Fine-tuning

Reproducing Mixer results

Reference

Citations

Owner

Eunkwang Jeon

Implementation of PersonaGPT Dialog Model

Towards Long-Form Video Understanding

"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

CT Based COVID 19 Diagnose by Image Processing and Deep Learning

Official PyTorch implementation of RIO

SimulLR - PyTorch Implementation of SimulLR

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

Repository for training material for the 2022 SDSC HPC/CI User Training Course

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)