Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Last update: Dec 23, 2022

Overview

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22)

Paper Link | Project Page

Abstract :

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds. Self-supervised learning, which operates without any human labeling, is a promising approach to address this issue. We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. It enables a 3D-2D correspondence of objects by maximizing agreement between point clouds and the corresponding rendered 2D image in the invariant space, while encouraging invariance to transformations in the point cloud modality. Our joint training objective combines the feature correspondences within and across modalities, thus ensembles a rich learning signal from both 3D point cloud and 2D image modalities in a self-supervised fashion. Experimental results show that our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation. Further, the ablation studies validate the potency of our approach for a better point cloud understanding.

Citation

If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@inproceedings{afham2022crosspoint,
    title={CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding}, 
    author={Mohamed Afham and Isuru Dissanayake and Dinithi Dissanayake and Amaya Dharmasiri and Kanchana Thilakarathna and Ranga Rodrigo},
    booktitle={IEEE/CVF International Conference on Computer Vision and Pattern Recognition},
    month = {June},
    year={2022}
  }

Dependencies

Refer requirements.txt for the required packages.

Pretrained Models

CrossPoint pretrained models with DGCNN feature extractor are available here.

Download data

Datasets are available here. Run the command below to download all the datasets (ShapeNetRender, ModelNet40, ScanObjectNN, ShapeNetPart) to reproduce the results.

cd data
source download_data.sh

Train CrossPoint

Refer scripts/script.sh for the commands to train CrossPoint.

Downstream Tasks

1. 3D Object Classification

Run eval_ssl.ipynb notebook to perform linear SVM object classification in both ModelNet40 and ScanObjectNN datasets.

2. Few-Shot Object Classification

Refer scripts/fsl_script.sh to perform few-shot object classification.

3. 3D Object Part Segmentation

Refer scripts/script.sh for fine-tuning experiment for part segmentation in ShapeNetPart dataset.

Acknowledgements

Our code borrows heavily from DGCNN repository. We thank the authors of DGCNN for releasing their code. If you use our model, please consider citing them as well.

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Related tags

Overview

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22)

Paper Link | Project Page

Abstract :

Citation

Dependencies

Pretrained Models

Download data

Train CrossPoint

Downstream Tasks

1. 3D Object Classification

2. Few-Shot Object Classification

3. 3D Object Part Segmentation

Acknowledgements

Owner

Mohamed Afham

BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

A standard framework for modelling Deep Learning Models for tabular data

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT

Facial Image Inpainting with Semantic Control

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

Pytorch implementation of PCT: Point Cloud Transformer

Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks.

Implementation of the paper titled "Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees"

METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method