[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Overview

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight)

Demo | Paper

[NEW!] Time to play with our interactive web demo!

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation.

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, Yan Xu
Tsinghua University and Microsoft Research
arXiv | OpenReview

Overview

This repo is implemented upon and has the same dependencies as the official StyleGAN2 repo. We also provide a Dockerfile for Docker users. This repo currently supports:

  • Large scale image completion experiments on FFHQ and Places2
  • Image-to-image translation experiments on edges to photos and COCO-Stuff
  • Evaluation code of Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS)

Datasets

  • FFHQ dataset (in TFRecords format) can be downloaded following the StyleGAN2 repo.
  • Places2 dataset can be downloaded in this website (Places365-Challenge 2016 high-resolution images, training set and validation set). The raw images should be converted into TFRecords using dataset_tools/create_places2.py.

Training

The following script is for training on FFHQ. It will splits 10k images for validation. We recommend using 8 NVIDIA Tesla V100 GPUs for training. Training at 512x512 resolution takes about 1 week.

python run_training.py --data-dir=DATA_DIR --dataset=DATASET --metrics=ids10k --num-gpus=8

The following script is for training on Places2, which has a validation set of 36500 images:

python run_training.py --data-dir=DATA_DIR --dataset=DATASET --metrics=ids36k5 --total-kimg 50000 --num-gpus=8

Evaluation

The following script is for evaluation:

python run_metrics.py --data-dir=DATA_DIR --dataset=DATASET --network=CHECKPOINT_FILE(S) --metrics=METRIC(S) --num-gpus=1

Commonly used metrics are ids10k and ids36k5 (for FFHQ and Places2 respectively), which will compute P-IDS and U-IDS together with FID. By default, masks are generated randomly for evaluation, or you may append the metric name with -h0 ([0.0, 0.2]) to -h4 ([0.8, 1.0]) to specify the range of masked ratio.

Our pre-trained models are available on Google Drive. Below lists our provided pre-trained models:

Model name & URL Description
co-mod-gan-ffhq-9-025000.pkl Large scale image completion on FFHQ (512x512)
co-mod-gan-ffhq-10-025000.pkl Large scale image completion on FFHQ (1024x1024)
co-mod-gan-places2-050000.pkl Large scale image completion on Places2 (512x512)
co-mod-gan-coco-stuff-025000.pkl Image-to-image translation on COCO-Stuff (labels to photos) (512x512)
co-mod-gan-edges2shoes-025000.pkl Image-to-image translation on edges2shoes (256x256)
co-mod-gan-edges2handbags-025000.pkl Image-to-image translation on edges2handbags (256x256)

Use the following script to run the interactive demo locally:

python run_demo.py -d DATA_DIR/DATASET -c CHECKPOINT_FILE(S)

Citation

If you find this code helpful, please cite our paper:

@inproceedings{zhao2021comodgan,
  title={Large Scale Image Completion via Co-Modulated Generative Adversarial Networks},
  author={Zhao, Shengyu and Cui, Jonathan and Sheng, Yilun and Dong, Yue and Liang, Xiao and Chang, Eric I and Xu, Yan},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2021}
}
Owner
Shengyu Zhao
Undergraduate at IIIS, Tsinghua University. Working with MIT and Microsoft Research.
Shengyu Zhao
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark Project | Arxiv | YouTube | | Abstract In recent years, deep learning-based methods

CVSM Group - email: <a href=[email protected]"> 188 Dec 12, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal

A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases,

Chris Hughes 110 Dec 23, 2022
Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Attack-Probabilistic-Models This is the source code for Adversarial Attacks on Probabilistic Autoregressive Forecasting Models. This repository contai

SRI Lab, ETH Zurich 25 Sep 14, 2022
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

Google Research 1.6k Dec 27, 2022
Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

TimeCycle Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework,

Xiaolong Wang 706 Nov 29, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
AugLiChem - The augmentation library for chemical systems.

AugLiChem Welcome to AugLiChem! The augmentation library for chemical systems. This package supports augmentation for both crystaline and molecular sy

BaratiLab 17 Jan 08, 2023
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022
3rd Place Solution of the Traffic4Cast Core Challenge @ NeurIPS 2021

3rd Place Solution of Traffic4Cast 2021 Core Challenge This is the code for our solution to the NeurIPS 2021 Traffic4Cast Core Challenge. Paper Our so

7 Jul 25, 2022
Film review classification

Film review classification Решение задачи классификации отзывов на фильмы на положительные и отрицательные с помощью рекуррентных нейронных сетей 1. З

Nikita Dukin 3 Jan 21, 2022
Unofficial JAX implementations of Deep Learning models

JAX Models Table of Contents About The Project Getting Started Prerequisites Installation Usage Contributing License Contact About The Project The JAX

107 Jan 05, 2023
simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Teach with Dynamic Loss Functions This repo contains the simple demo for the NeurIPS-18 paper: Learning to Teach with Dynamic Loss Functio

Lijun Wu 15 Dec 30, 2021
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
DeconvNet : Learning Deconvolution Network for Semantic Segmentation

DeconvNet: Learning Deconvolution Network for Semantic Segmentation Created by Hyeonwoo Noh, Seunghoon Hong and Bohyung Han at POSTECH Acknowledgement

Hyeonwoo Noh 325 Oct 20, 2022
Weakly supervised medical named entity classification

Trove Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers

60 Nov 18, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 261 Jan 09, 2023
Code and training data for our ECCV 2016 paper on Unsupervised Learning

Shuffle and Learn (Shuffle Tuple) Created by Ishan Misra Based on the ECCV 2016 Paper - "Shuffle and Learn: Unsupervised Learning using Temporal Order

Ishan Misra 44 Dec 08, 2021