Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

Overview

How Well Do Self-Supervised Models Transfer?

This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Models Transfer?

Requirements

This codebase has been tested with the following package versions:

python=3.6.8
torch=1.2.0
torchvision=0.4.0
PIL=7.1.2
numpy=1.18.1
scipy=1.2.1
pandas=1.0.3
tqdm=4.31.1
sklearn=0.22.2

Pre-trained Models

In the paper we evaluate 14 pre-trained ResNet50 models, 13 self-supervised and 1 supervised. To download and prepare all models in the same format, run:

python download_and_prepare_models.py

This will prepare the models in the same format and save them in a directory named models.

Note 1: For SimCLR-v1 and SimCLR-v2, the TensorFlow checkpoints need to be downloaded manually (using the links in the table below) and converted into PyTorch format (using https://github.com/tonylins/simclr-converter and https://github.com/Separius/SimCLRv2-Pytorch, respectively).

Note 2: In order to convert BYOL, you may need to install some packages by running:

pip install jax jaxlib dill git+https://github.com/deepmind/dm-haiku

Below are links to the pre-trained weights used.

Model URL
InsDis https://www.dropbox.com/sh/87d24jqsl6ra7t2/AACcsSIt1_Njv7GsmsuzZ6Sta/InsDis.pth
MoCo-v1 https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v1_200ep/moco_v1_200ep_pretrain.pth.tar
PCL-v1 https://storage.googleapis.com/sfr-pcl-data-research/PCL_checkpoint/PCL_v1_epoch200.pth.tar
PIRL https://www.dropbox.com/sh/87d24jqsl6ra7t2/AADN4jKnvTI0U5oT6hTmQZz8a/PIRL.pth
PCL-v2 https://storage.googleapis.com/sfr-pcl-data-research/PCL_checkpoint/PCL_v2_epoch200.pth.tar
SimCLR-v1 https://storage.cloud.google.com/simclr-gcs/checkpoints/ResNet50_1x.zip
MoCo-v2 https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_800ep/moco_v2_800ep_pretrain.pth.tar
SimCLR-v2 https://console.cloud.google.com/storage/browser/simclr-checkpoints/simclrv2/pretrained/r50_1x_sk0
SeLa-v2 https://dl.fbaipublicfiles.com/deepcluster/selav2_400ep_pretrain.pth.tar
InfoMin https://www.dropbox.com/sh/87d24jqsl6ra7t2/AAAzMTynP3Qc8mIE4XWkgILUa/InfoMin_800.pth
BYOL https://storage.googleapis.com/deepmind-byol/checkpoints/pretrain_res50x1.pkl
DeepCluster-v2 https://dl.fbaipublicfiles.com/deepcluster/deepclusterv2_800ep_pretrain.pth.tar
SwAV https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar
Supervised We use weights from torchvision.models.resnet50(pretrained=True)

Datasets

There are several classes defined in the datasets directory. The data is expected in a directory name data, located on the same level as this repository. Below is an outline of the expected file structure:

data/
    CIFAR10/
    DTD/
    ...
ssl-transfer/
    datasets/
    models/
    readme.md
    ...

Many-shot (Linear)

We provide the code for our linear evaluation in linear.py.

To evaluate DeepCluster-v2 on CIFAR10 given our pre-computed best regularisation hyperparameter, run:

python linear.py --dataset cifar10 --model deepcluster-v2 --C 0.316

The test accuracy should be close to 94.07%, the value reported in Table 1 of the paper.

To evaluate the Supervised baseline, run:

python linear.py --dataset cifar10 --model supervised --C 0.056

This model should achieve close to 91.47%.

To search for the best regularisation hyperparameter on the validation set, exclude the --C argument:

python linear.py --dataset cifar10 --model supervised

Finally, when using SimCLR-v1 or SimCLR-v2, always use the --no-norm argument:

python linear.py --dataset cifar10 --model simclr-v1 --no-norm

Many-shot (Finetune)

We provide code for finetuning in finetune.py.

To finetune DeepCluster-v2 on CIFAR10, run:

python finetune.py --dataset cifar10 --model deepcluster-v2

This model should achieve close to 97.06%, the value reported in Table 1 of the paper.

Few-shot (Kornblith & CD-FSL)

We provide the code for our few-shot evaluation in few_shot.py.

To evaluate DeepCluster-v2 on EuroSAT in a 5-way 5-shot setup, run:

python few_shot.py --dataset eurosat --model deepcluster-v2 --n-way 5 --n-support 5

The test accuracy should be close to 88.39% ± 0.49%, the value reported in Table 2 of the paper.

Or, to evaluate the Supervised baseline on ChestX in a 5-way 50-shot setup, run:

python few_shot.py --dataset chestx --model supervised --n-way 5 --n-support 50

This model should achieve close to 32.34% ± 0.45%.

Object Detection

We use the detectron2 framework to train our models on PASCAL VOC object detection.

Below is an outline of the expected file structure, including config files, converted models and the detectron2 framework:

detectron2/
    tools/
        train_net.py
        ...
    ...
ssl-transfer/
    detectron2-configs/
        finetune/
            byol.yaml
            ...
        frozen/
            byol.yaml
            ...
    models/
        detectron2/
            byol.pkl
            ...
        ...
    ...

To set it up, perform the following steps:

  1. Install detectron2 (requries PyTorch 1.5 or newer). We expect the installed framework to be located at the same level as this repository, see outline of expected file structure above.
  2. Convert the models into the format used by detectron2 by running python convert_to_detectron2.py. The converted models will be saved in a directory called detectron2 inside the models directory.

We include the config files for the frozen training in detectron2-configs/frozen and for full finetuning in detectron2-configs/finetune. In order to train models, navigate into detectron2/tools/. We can now train e.g. BYOL with a frozen backbone on 1 GPU by running:

./train_net.py --num-gpus 1 --config-file ../../ssl-transfer/detectron2-configs/frozen/byol.yaml OUTPUT_DIR ./output/byol-frozen

This model should achieve close to 82.01 AP50, the value reported in Table 3 of the paper.

Surface Normal Estimation

The code for running the surface normal estimation experiments is given in the surface-normal-estimation. We use the MIT CSAIL Semantic Segmentation Toolkit, but there is also a docker configuration file that can be used to build a container with all the dependencies installed. One can train a model with a command like:

./scripts/train_finetune_models.sh <pretrained-model-path> <checkpoint-directory>

and the resulting model can be evaluated with

./scripts/test_models.sh <checkpoint-directory>

Semantic Segmentation

We also use the same framework performing semantic segmentation. As per the surface normal estimation experiments, we include a docker configuration file to make getting dependencies easier. Before training a semantic segmentation model you will need to change the paths in the relevant YAML configuration file to point to where you have stored the pre-trained models and datasets. Once this is done the training script can be run with, e.g.,

python train.py --gpus 0,1 --cfg selfsupconfig/byol.yaml

where selfsupconfig/byol.yaml is the aforementioned configuration file. The resulting model can be evaluated with

python eval_multipro.py --gpus 0,1 --cfg selfsupconfig/byol.yaml

Citation

If you find our work useful for your research, please consider citing our paper:

@inproceedings{Ericsson2021HowTransfer,
    title = {{How Well Do Self-Supervised Models Transfer?}},
    year = {2021},
    booktitle = {CVPR},
    author = {Ericsson, Linus and Gouk, Henry and Hospedales, Timothy M.},
    url = {http://arxiv.org/abs/2011.13377},
    arxivId = {2011.13377}
}

If you have any questions, feel welcome to create an issue or contact Linus Ericsson ([email protected]).

Owner
Linus Ericsson
PhD student in the Data Science CDT at The University of Edinburgh
Linus Ericsson
Joint project of the duo Hacker Ninjas

Project Smoothie Společný projekt dua Hacker Ninjas. První pokus o hříčku po třech týdnech učení se programování. Jakub Kolář e:\

Jakub Kolář 2 Jan 07, 2022
MoCap-Solver: A Neural Solver for Optical Motion Capture Data

MoCap-Solver is a data-driven-based robust marker denoising method, which takes raw mocap markers as input and outputs corresponding clean markers and skeleton motions.

55 Dec 28, 2022
yolov5目标检测模型的知识蒸馏(基于响应的蒸馏)

代码地址: https://github.com/Sharpiless/yolov5-knowledge-distillation 教师模型: python train.py --weights weights/yolov5m.pt \ --cfg models/yolov5m.ya

52 Dec 04, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
PyTorch implementation of GLOM

GLOM PyTorch implementation of GLOM, Geoffrey Hinton's new idea that integrates concepts from neural fields, top-down-bottom-up processing, and attent

Yeonwoo Sung 20 Aug 17, 2022
This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations"

Robust Counterfactual Explanations This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations". I

Marco 5 Dec 20, 2022
The final project for "Applying AI to Wearable Device Data" course from "AI for Healthcare" - Udacity.

Motion Compensated Pulse Rate Estimation Overview This project has 2 main parts. Develop a Pulse Rate Algorithm on the given training data. Then Test

Omar Laham 2 Oct 25, 2022
PyTorch implementation of MICCAI 2018 paper "Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector"

Grouped SSD (GSSD) for liver lesion detection from multi-phase CT Note: the MICCAI 2018 paper only covers the multi-phase lesion detection part of thi

Sang-gil Lee 36 Oct 12, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022
The first dataset of composite images with rationality score indicating whether the object placement in a composite image is reasonable.

Object-Placement-Assessment-Dataset-OPA Object-Placement-Assessment (OPA) is to verify whether a composite image is plausible in terms of the object p

BCMI 53 Nov 15, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

52 Dec 29, 2022
MRI reconstruction (e.g., QSM) using deep learning methods

deepMRI: Deep learning methods for MRI Authors: Yang Gao, Hongfu Sun This repo is devloped based on Pytorch (1.8 or later) and matlab (R2019a or later

Hongfu Sun 17 Dec 18, 2022
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation A pytorch-version implementation codes of paper:

11 Dec 13, 2022
Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

Vision Transformer(ViT) in Tensorflow2 Tensorflow2 implementation of the Vision Transformer(ViT). This repository is for An image is worth 16x16 words

sungjun lee 42 Dec 27, 2022
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

Levi Zim 359 Jan 05, 2023
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

Yuzhe Yang 568 Dec 30, 2022