Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Last update: Dec 29, 2022

Related tags

Overview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Introduction

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking.

Installation

The code is tested with Red Hat Enterprise Linux Workstation release 7.7 (Maipo), g++ (GCC) 8.3.1, PyTorch (both v1.4.0 and v1.8.1 are supported), CUDA 10.2 and cuDNN v7.6.

Compile the CUDA layers for PointNet++, which we used for furthest point sampling (FPS) and radius neighbouring search:

mv modules-pytorch-1.4.0/modules-pytorch-1.8.1 modules
cd modules
python setup.py install

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{fan21p4transformer,
  author    = {Hehe Fan and
               Yi Yang and
               Mohan Kankanhalli},
  title     = {Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos},
  booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  year      = {2021}
}

Related Repos

PointNet++ PyTorch implementation: https://github.com/facebookresearch/votenet/tree/master/pointnet2
MeteorNet: https://github.com/xingyul/meteornet
3DV: https://github.com/3huo/3DV-Action
PSTNet: https://github.com/hehefan/Point-Spatio-Temporal-Convolution
Transformer: https://github.com/lucidrains/vit-pytorch
PointRNN (TensorFlow implementation): https://github.com/hehefan/PointRNN
PointRNN (PyTorch implementation): https://github.com/hehefan/PointRNN-PyTorch

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Related tags

Overview

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Introduction

Installation

Citation

Related Repos

Owner

Hehe Fan

[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

Pytorch implementation of FlowNet by Dosovitskiy et al.

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Hyperparameter tuning for humans

Towards Debiasing NLU Models from Unknown Biases

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark

Hand tracking demo for DIY Smart Glasses with a remote computer doing the work

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

SCU OlympicsRunning Baseline

A proof of concept ai-powered Recaptcha v2 solver