SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Overview

SwinTrack

This is the official repo for SwinTrack.

banner

A Simple and Strong Baseline

performance

Prerequisites

Environment

conda (recommended)

conda create -y -n SwinTrack
conda activate SwinTrack
conda install -y anaconda
conda install -y pytorch torchvision cudatoolkit -c pytorch
conda install -y -c fvcore -c iopath -c conda-forge fvcore
pip install wandb
pip install timm

pip

pip install -r requirements.txt

Dataset

Download

Unzip

The paths should be organized as following:

lasot
├── airplane
├── basketball
...
├── training_set.txt
└── testing_set.txt

lasot_extension
├── atv
├── badminton
...
└── wingsuit

got-10k
├── train
│   ├── GOT-10k_Train_000001
│   ...
├── val
│   ├── GOT-10k_Val_000001
│   ...
└── test
    ├── GOT-10k_Test_000001
    ...
    
trackingnet
├── TEST
├── TRAIN_0
...
└── TRAIN_11

coco2017
├── annotations
│   ├── instances_train2017.json
│   └── instances_val2017.json
└── images
    ├── train2017
    │   ├── 000000000009.jpg
    │   ├── 000000000025.jpg
    │   ...
    └── val2017
        ├── 000000000139.jpg
        ├── 000000000285.jpg
        ...

Prepare path.yaml

Copy path.template.yaml as path.yaml and fill in the paths.

LaSOT_PATH: '/path/to/lasot'
LaSOT_Extension_PATH: '/path/to/lasot_ext'
GOT10k_PATH: '/path/to/got10k'
TrackingNet_PATH: '/path/to/trackingnet'
COCO_2017_PATH: '/path/to/coco2017'

Prepare dataset metadata cache (optional)

Download the metadata cache from google drive, and unzip it in datasets/cache/

datasets
└── cache
    ├── SingleObjectTrackingDataset_MemoryMapped
    │   └── filtered
    │       ├── got-10k-got10k_vot_train_split-train-3c1ffeb0c530522f0345d088b2f72168.np
    │       ...
    └── DetectionDataset_MemoryMapped
        └── filtered
            └── coco2017-nocrowd-train-bcd5bf68d4b87619ab451fe293098401.np

Login to wandb

Register an account at wandb, then login with command:

wandb login

Training & Evaluation

Train and evaluate on a single GPU

# Tiny
python main.py SwinTrack Tiny --output_dir /path/to/output -W $num_dataloader_workers

# Base
python main.py SwinTrack Base --output_dir /path/to/output -W $num_dataloader_workers

# Base-384
python main.py SwinTrack Base-384 --output_dir /path/to/output -W $num_dataloader_workers

--output_dir is optional, -W defaults to 4.

note: our code performs evaluation automatically when training is done, output is saved in /path/to/output/test_metrics.

Train and evaluate on multiple GPUs using DDP

# Tiny
python main.py SwinTrack Tiny --distributed_nproc_per_node $num_gpus --distributed_do_spawn_workers --output_dir /path/to/output -W $num_dataloader_workers

Train and evaluate on multiple nodes with multiple GPUs using DDP

# Tiny
python main.py SwinTrack Tiny --master_address $master_address --distributed_node_rank $node_rank distributed_nnodes $num_nodes --distributed_nproc_per_node $num_gpus --distributed_do_spawn_workers --output_dir /path/to/output -W $num_dataloader_workers 

Train and evaluate with run.sh helper script

# Train and evaluate on all GPUs
./run.sh SwinTrack Tiny --output_dir /path/to/output -W $num_dataloader_workers
# Train and evaluate on multiple nodes
NODE_RANK=$NODE_INDEX NUM_NODES=$NUM_NODES MASTER_ADDRESS=$MASTER_ADDRESS DATE_WITH_TIME=$DATE_WITH_TIME ./run.sh SwinTrack Tiny --output_dir /path/to/output -W $num_dataloader_workers 

Ablation study

The ablation study can be done by applying a small patch to the main config file.

Take the ResNet 50 backbone as the example, the rest parameters are the same as the above.

# Train and evaluate with resnet50 backbone
python main.py SwinTrack Tiny --mixin_config resnet.yaml
# or with run.sh
./run.sh SwinTrack Tiny --mixin resnet.yaml

All available config patches are listed in config/SwinTrack/Tiny/mixin.

Train and evaluate with GOT-10k dataset

python main.py SwinTrack Tiny --mixin_config got10k.yaml

Submit $output_dir/test_metrics/got10k/submit/*.zip to the GOT-10k evaluation server to get the result of GOT-10k test split.

Evaluate Existing Model

Download the pretrained model from google drive, then type:

python main.py SwinTrack Tiny --weight_path /path/to/weigth_file.pth --mixin_config evaluation.yaml --output_dir /path/to/output

Our code can evaluate the model on multiple GPUs in parallel, so all parameters above are also available.

Tracking results

Touch here google drive

Citation

@misc{lin2021swintrack,
      title={SwinTrack: A Simple and Strong Baseline for Transformer Tracking}, 
      author={Liting Lin and Heng Fan and Yong Xu and Haibin Ling},
      year={2021},
      eprint={2112.00995},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
LitingLin
LitingLin
clustimage is a python package for unsupervised clustering of images.

clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve

Erdogan Taskesen 52 Jan 02, 2023
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge"

FSGAN Here is the official PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge". This project achieve the translation between

Deng-Ping Fan 32 Oct 10, 2022
Fast RFC3339 compliant Python date-time library

udatetime: Fast RFC3339 compliant date-time library Handling date-times is a painful act because of the sheer endless amount of formats used by people

Simon Pirschel 235 Oct 25, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
Measure WWjj polarization fraction

WlWl Polarization Measure WWjj polarization fraction Paper: arXiv:2109.09924 Notice: This code can only be used for the inference process, if you want

4 Apr 10, 2022
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 07, 2023
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

538 Jan 09, 2023
A deep learning based semantic search platform that computes similarity scores between provided query and documents

semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents

1 Nov 30, 2021
A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

196 Jan 05, 2023
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

Princeton Natural Language Processing 119 Sep 28, 2022
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Lihe Yang 209 Jan 01, 2023
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
Material del curso IIC2233 Programación Avanzada 📚

Contenidos Los contenidos se organizan según la semana del semestre en que nos encontremos, y según la semana que se destina para su estudio. Los cont

IIC2233 @ UC 72 Dec 23, 2022
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the w

Yuval Nirkin 182 Dec 14, 2022
Bringing Computer Vision and Flutter together , to build an awesome app !!

Bringing Computer Vision and Flutter together , to build an awesome app !! Explore the Directories Flutter · Machine Learning Table of Contents About

Padmanabha Banerjee 14 Apr 07, 2022
A simple code to convert image format and channel as well as resizing and renaming multiple images.

Rename-Resize-and-convert-multiple-images A simple code to convert image format and channel as well as resizing and renaming multiple images. This cod

Happy N. Monday 3 Feb 15, 2022
Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Obstacle Tower Challenge using Deep Reinforcement Learning Unity Obstacle Tower is a challenging realistic 3D, third person perspective and procedural

Zhuoyu Feng 5 Feb 10, 2022