RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Last update: Dec 09, 2022

Overview

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

YouTube | BiliBili

16X interpolation results from two input images:

Introduction

This project is an official implementation (MegEngine implementation) of RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation. For Pytorch implementation, please refers to this repo. Currently, our model can run 30+FPS for 2X 720p interpolation on a 2080Ti GPU. It supports arbitrary-timestep interpolation between a pair of images.

CLI Usage

Installation

git clone [email protected]:MegEngine/arXiv2020-RIFE
cd arXiv2020-RIFE
pip3 install -r requirements.txt

Download the pretrained HD models from here.
Unzip and move the pretrained parameters to train_log/*
This model is not reported by our paper, for our paper model please refer to evaluation.

Run

Image Interpolation

python3 inference_img.py --img img0.png img1.png --exp=4

(2^4=16X interpolation results) After that, you can use pngs to generate mp4:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -c:v libx264 -pix_fmt yuv420p output/slomo.mp4 -q:v 0 -q:a 0

You can also use pngs to generate gif:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -vf "split[s0][s1];[s0]palettegen=stats_mode=single[p];[s1][p]paletteuse=new=1" output/slomo.gif

Evaluation

Download RIFE model or RIFE_m model reported by our paper.

MiddleBury: Download MiddleBury OTHER dataset at ./other-data and ./other-gt-interp

HD: Download HD dataset at ./HD_dataset. We also provide a google drive download link.

We provide code for evaluating with datasets above, please follow lines:

python3 benchmark/HD_multi_4X.py
python3 benchmark/HD.py
python3 benchmark/MiddleBury_Other.py
python3 benchmark/yuv_frame_io.py
python3 testtime.py

Training and Reproduction

Download Vimeo90K dataset.

We use 16 CPUs, 4 GPUs and 20G memory for training:

python3 train.py --arbitrary=False

Citation

@article{huang2020rife,
  title={RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation},
  author={Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang},
  journal={arXiv preprint arXiv:2011.06294},
  year={2020}
}

Reference

Optical Flow: ARFlow pytorch-liteflownet RAFT pytorch-PWCNet

Video Interpolation: DVF TOflow SepConv DAIN CAIN MEMC-Net SoftSplat BMBC EDSC

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Related tags

Overview

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

YouTube | BiliBili

Introduction

CLI Usage

Installation

Run

Evaluation

Training and Reproduction

Citation

Reference

Owner

旷视天元 MegEngine

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

CRF-RNN for Semantic Image Segmentation - PyTorch version

Video-Music Transformer

Deep learning library for solving differential equations and more

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Dilated Convolution with Learnable Spacings PyTorch

PyTorch implementation of the TTC algorithm

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Deep Multimodal Neural Architecture Search

neural image generation

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop

A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Official implementation of Self-supervised Image-to-text and Text-to-image Synthesis

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Prototype for Baby Action Detection and Classification

A transformer which can randomly augment VOC format dataset (both image and bbox) online.