Code for the upcoming CVPR 2021 paper

Last update: Dec 30, 2022

Related tags

Overview

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael Firman – CVPR 2021

We introduce ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.

✅ Self-supervised: We train from monocular video only. No depths or poses are needed at training or test time.
✅ Good depths from single frames; even better depths from short sequences.
✅ Efficient: Only one forward pass at test time. No test-time optimization needed.
✅ State-of-the-art self-supervised monocular-trained depth estimation on KITTI and CityScapes.

Overview

Cost volumes are commonly used for estimating depths from multiple input views:

However, cost volumes do not easily work with self-supervised training.

In our paper, we:

Introduce an adaptive cost volume to deal with unknown scene scales
Fix problems with moving objects
Introduce augmentations to deal with static cameras and start-of-sequence frames

These contributions enable cost volumes to work with self-supervised training:

With our contributions, short test-time sequences give better predictions than methods which predict depth from just a single frame.

✏️ 📄 Citation

If you find our work useful or interesting, please cite our paper:

@inproceedings{watson2021temporal,
    author = {Jamie Watson and
              Oisin Mac Aodha and
              Victor Prisacariu and
              Gabriel Brostow and
              Michael Firman},
    title = {{The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth}},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

📈 Results

Our ManyDepth method outperforms all previous methods in all subsections across most metrics, whether or not the baselines use multiple frames at test time. See our paper for full details.

👀 Reproducing Paper Results

To recreate the results from our paper, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_KITTI_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name>

Depending on the size of your GPU, you may need to set --batch_size to be lower than 12. Additionally you can train a high resolution model by adding --height 320 --width 1024.

For instructions on downloading the KITTI dataset, see Monodepth2

To train a CityScapes model, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_preprocessed_cityscapes_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name> \
    --dataset cityscapes_preprocessed \
    --split cityscapes_preprocessed \
    --freeze_teacher_epoch 5 \
    --height 192 --width 512

This assumes you have already preprocessed the CityScapes dataset using SfMLearner's prepare_train_data.py script. We used the following command:

python prepare_train_data.py \
    --img_height 512 \
    --img_width 1024 \
    --dataset_dir <path_to_downloaded_cityscapes_data> \
    --dataset_name cityscapes \
    --dump_root <your_preprocessed_cityscapes_path> \
    --seq_length 3 \
    --num_threads 8

Note that while we use the --img_height 512 flag, the prepare_train_data.py script will save images which are 1024x384 as it also crops off the bottom portion of the image. You could probably save disk space without a loss of accuracy by preprocessing with --img_height 256 --img_width 512 (to create 512x192 images), but this isn't what we did for our experiments.

💾 Pretrained weights and evaluation

You can download weights for some pretrained models here:

To evaluate a model on KITTI, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_KITTI_path> \
    --load_weights_folder <your_model_path>
    --eval_mono

Make sure you have first run export_gt_depth.py to extract ground truth files.

And to evaluate a model on Cityscapes, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_cityscapes_path> \
    --load_weights_folder <your_model_path>
    --eval_mono \
    --eval_split cityscapes

During evaluation, we crop and evaluate on the middle 50% of the images.

We provide ground truth depth files HERE, which were converted from pixel disparities using intrinsics and the known baseline. Download this and unzip into splits/cityscapes.

🖼 Running on your own images

We provide some sample code in test_simple.py which demonstrates multi-frame inference. This predicts depth for a sequence of two images cropped from a dashcam video. Prediction also requires an estimate of the intrinsics matrix, in json format. For the provided test images, we have estimated the intrinsics to be equivalent to those of the KITTI dataset. Note that the intrinsics provided in the json file are expected to be in normalised coordinates.

Download and unzip model weights from one of the links above, and then run the following command:

python -m manydepth.test_simple \
    --target_image_path assets/test_sequence_target.jpg \
    --source_image_path assets/test_sequence_source.jpg \
    --intrinsics_json_path assets/test_sequence_intrinsics.json \
    --model_path path/to/weights

A predicted depth map rendering will be saved to assets/test_sequence_target_disp.jpeg.

Code for the upcoming CVPR 2021 paper

Related tags

Overview

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Overview

✏️ 📄 Citation

📈 Results

👀 Reproducing Paper Results

💾 Pretrained weights and evaluation

🖼 Running on your own images

👩‍⚖️ License

Owner

Niantic Labs

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

Optimizes image files by converting them to webp while also updating all references.

Depth image based mouse cursor visual haptic

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Code for testing convergence rates of Lipschitz learning on graphs

custom pytorch implementation of MoCo v3

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

Improving Deep Network Debuggability via Sparse Decision Layers

A general-purpose encoder-decoder framework for Tensorflow

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

TRACER: Extreme Attention Guided Salient Object Tracing Network implementation in PyTorch

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Code for all the Advent of Code'21 challenges mostly written in python

unet-family: Ultimate version

An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)