🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

Overview

Super Resolution for Real Time Image Enhancement

Final Results from Validation data

Introduction

It is no suprising that adversarial training is indeed possible for super resolution tasks. The problem with pretty much all the state of the art models is that they are just not usable for native deployment, because of 100s of MBs of model size.

Also, most of the known super resolution frameworks only works on single compression method, such as bicubic, nearest, bilinear etc. Which helps the model a lot to beat the previous state of the art scores but they perform poorly on real life low resolution images because they might not belong to the same compression method for which the model was trained for.

So for me the main goal with this project wasn't just to create yet another super resolution model, but to develop a lightest model as possible which works well on any random compression methods.

With this goal in mind, I tried adopting the modern super resolution frameworks such as relativistic adversarial training, and content loss optimization (I've mainly followed the ESRGAN, with few changes in the objective function), and finally was able to create a model of size 5MB!!!

API Usage

from inference import enhance_image

enhance_image(
    lr_image, # 
   
    , # or lr_path = 
    
     ,
    
   
    sr_path, # ,
    visualize, # 
   
    size, # 
   
    ,
   
    )

CLI Usage

usage: inference.py [-h] [--lr-path LR_PATH] [--sr-path SR_PATH]

Super Resolution for Real Time Image Enhancement

optional arguments:
  -h, --help         show this help message and exit
  --lr-path LR_PATH  Path to the low resolution image.
  --sr-path SR_PATH  Output path where the enhanced image would be saved.

Model architectures

The main building block of the generator is the Residual in Residual Dense Block (RRDB), which consists of classic DenseNet module but coupled with a residual connection

Now in the original paper the authors mentioned to remove the batch normalization layer in order to remove the checkboard artifacts, but due to the extreme small size of my model, I found utilizing the batch normalization layer quite effective for both speeding up the training and better quality results.

Another change I made in the original architecture is replacing the nearest upsampling proceedure with the pixel shuffle, which helped a lot to produce highly detailed outputs given the size of the model.

The discriminator is made up of blocks of classifc convolution layer followed by batch normalization followed by leaky relu non linearity.

Relativistic Discriminator

A relativistic discriminator tries to predict the probability that a real image is relatively more realistic than a fake one.

So the discriminator and the generator are optimized to minizize these corresponding losses:

Discriminator's adversarial loss:

Generator's adversarial loss:

Perceptual Loss (Final objective for the Generator)

Original perceptual loss introduced in SRGAN paper combines the adversarial loss and the content loss obtained from the features of final convolution layers of the VGG Net.

Effectiveness of perceptual loss if found increased by constraining on features before activation rather than after activation as practiced in SRGAN.

To make the Perceptual loss more effective, I additionally added the preactivation features disparity from both shallow and deep layers, making the generator produce better results.

In addition to content loss and relativistic adversarial optimization, a simple pixel loss is also added to the generator's final objective as per the paper.

Now based on my experiments I found it really hard for the generator to produce highly detailed outputs when its also minimizing the pixel loss (I'm imputing this observation to the fact that my model is very small).

This is a bit surprising because optimizing an additional objective function which has same optima should help speeding up the training. My interpretation is since super resolution is not a one to one matching, as multiple results are there for a single low resolution patch (more on patch size below), so forcing the generator to converge to a single output would cause the generator to not produce detailed but instead the average of all those possible outputs.

So I tried reducing the pixel loss weight coefficient down to 1e-2 to 1e-4 as described in the paper, and then compared the results with the generator trained without any pixel loss, and found that pixel loss has no significant visual improvements. So given my constrained training environment (Google Colab), I decided not to utilize the pixel loss as one of the part of generator's loss.

So here's the generator's final loss:

Patch size affect

Ideally larger the patch size better the adversarial training hence better the results, since an enlarged receptive field helps both the models to capture more semantic information. Therefore the paper uses 96x96 to 192x192 as the patch size resolution, but since I was constrained to utilize Google Colab, my patch size was only 32x32 😶 , and that too with batch size of 8.

Multiple compression methods

The idea is to make the generator independent of the compression that is applied to the training dataset, so that its much more robust in real life samples.

For this I randomly applied the nearest, bilinear and bicubic compressions on all the data points in the dataset every time a batch is processed.

Validation Results after ~500 epochs

Loss type

Value
Content Loss (L1) [5th, 10th, 20th preactivation features from VGGNet] ~38.582
Style Loss (L1) [320th preactivation features from EfficientNetB4] ~1.1752
Adversarial Loss ~1.550

Visual Comparisons

Below are some of the common outputs that most of the super resolution papers compare with (not used in the training data).

Author - Rishik Mourya

Owner
Rishik Mourya
3rd year Undergrad | ML Engineer @Nevronas.ai | ML Engineer @Skylarklabs.ai | And part-time Web Developer.
Rishik Mourya
Extract MNIST handwritten digits dataset binary file into bmp images

MNIST-dataset-extractor Extract MNIST handwritten digits dataset binary file into bmp images More info at http://yann.lecun.com/exdb/mnist/ Dependenci

Omar Mostafa 6 May 24, 2021
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
Systematic generalisation with group invariant predictions

Requirements are Python 3, TensorFlow v1.14, Numpy, Scipy, Scikit-Learn, Matplotlib, Pillow, Scikit-Image, h5py, tqdm. Experiments were run on V100 GPUs (16 and 32GB).

Faruk Ahmed 30 Dec 01, 2022
Unofficial Implement PU-Transformer

PU-Transformer-pytorch Pytorch unofficial implementation of PU-Transformer (PU-Transformer: Point Cloud Upsampling Transformer) https://arxiv.org/abs/

Lee Hyung Jun 7 Sep 21, 2022
Code for the paper: Sketch Your Own GAN

Sketch Your Own GAN Project | Paper | Youtube | Slides Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to mat

677 Dec 28, 2022
Codebase for ECCV18 "The Sound of Pixels"

Sound-of-Pixels Codebase for ECCV18 "The Sound of Pixels". *This repository is under construction, but the core parts are already there. Environment T

Hang Zhao 318 Dec 20, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

MMGEN-FaceStylor English | 简体中文 Introduction This repo is an efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits

OpenMMLab 182 Dec 27, 2022
Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022
Face recognize and crop them

Face Recognize Cropping Module Source 아이디어 Face Alignment with OpenCV and Python Requirement 필요 라이브러리 imutil dlib python-opence (cv2) Usage 사용 방법 open

Cho Moon Gi 1 Feb 15, 2022
Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

TableauBits 3 May 29, 2022
A Domain-Agnostic Benchmark for Self-Supervised Learning

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning This repository contains the code for DABS, a benchmark for domain-agnostic self-superv

Alex Tamkin 81 Dec 09, 2022
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

labml.ai 16.4k Jan 09, 2023
MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset This repository contains links to data and code to fetch and reproduce

Daniel Varab 19 Dec 16, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 102 Jan 05, 2023
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 05, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
Single Image Random Dot Stereogram for Tensorflow

TensorFlow-SIRDS Single Image Random Dot Stereogram for Tensorflow SIRDS is a means to present 3D data in a 2D image. It allows for scientific data di

Greg Peatfield 5 Aug 10, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 04, 2023
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

166 Jan 01, 2023