PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

Last update: Dec 07, 2022

Related tags

Deep Learning VIN_PyTorch_Visdom

Overview

VIN: Value Iteration Networks

This is an implementation of Value Iteration Networks (VIN) in PyTorch to reproduce the results.(TensorFlow version)

Key idea

A fully differentiable neural network with a 'planning' sub-module.
Value Iteration = Conv Layer + Channel-wise Max Pooling
Generalize better than reactive policies for new, unseen tasks.

Learned Reward Image and Its Value Images for each VI Iteration

Visualization	Grid world	Reward Image	Value Images
8x8
16x16
28x28

Dependencies

This repository requires following packages:

Python >= 3.6
Numpy >= 1.12.1
PyTorch >= 0.1.10
SciPy >= 0.19.0
visdom >= 0.1

Datasets

Each data sample consists of (x, y) coordinates of current state in grid world, followed by an obstacle image and a goal image.

Dataset size	8x8	16x16	28x28
Train set	77760	776440	4510695
Test set	12960	129440	751905

Running Experiment: Training

Grid world 8x8

python run.py --datafile data/gridworld_8x8.npz --imsize 8 --lr 0.005 --epochs 30 --k 10 --batch_size 128

Grid world 16x16

python run.py --datafile data/gridworld_16x16.npz --imsize 16 --lr 0.008 --epochs 30 --k 20 --batch_size 128

Grid world 28x28

python run.py --datafile data/gridworld_28x28.npz --imsize 28 --lr 0.003 --epochs 30 --k 36 --batch_size 128

Flags:

datafile: The path to the data files.
imsize: The size of input images. From: [8, 16, 28]
lr: Learning rate with RMSProp optimizer. Recommended: [0.01, 0.005, 0.002, 0.001]
epochs: Number of epochs to train. Default: 30
k: Number of Value Iterations. Recommended: [10 for 8x8, 20 for 16x16, 36 for 28x28]
ch_i: Number of channels in input layer. Default: 2, i.e. obstacles image and goal image.
ch_h: Number of channels in first convolutional layer. Default: 150, described in paper.
ch_q: Number of channels in q layer (~actions) in VI-module. Default: 10, described in paper.
batch_size: Batch size. Default: 128

Visualization with Visdom

We shall visualize the learned reward image and its corresponding value images for each VI iteration by using visdom.

Firstly start the server

python -m visdom.server

Open Visdom in browser in http://localhost:8097

Then run following to visualize learn reward and value images.

python vis.py --datafile learned_rewards_values_28x28.npz

NOTE: If you would like to produce GIF animation of value images on your own, the following command might be useful.

convert -delay 20 -loop 0 *.png value_function.gif

Benchmarks

GPU: TITAN X

Performance: Test Accuracy

NOTE: This is the accuracy on test set. It is different from the table in the paper, which indicates the success rate from rollouts of the learned policy in the environment.

Test Accuracy	8x8	16x16	28x28
PyTorch	99.16%	92.44%	88.20%
TensorFlow	99.03%	90.2%	82%

Speed with GPU

Speed per epoch	8x8	16x16	28x28
PyTorch	3s	15s	100s
TensorFlow	4s	25s	165s

Frequently Asked Questions

Q: How to get reward image from observation ?
- A: Observation image has 2 channels. First channel is obstacle image (0: free, 1: obstacle). Second channel is goal image (0: free, 10: goal). For example, in 8x8 grid world, the shape of an input tensor with batch size 128 is [128, 2, 8, 8]. Then it is fed into a convolutional layer with [3, 3] filter and 150 feature maps, followed by another convolutional layer with [3, 3] filter and 1 feature map. The shape of the output tensor is [128, 1, 8, 8]. This is the reward image.
Q: What is exactly transition model, and how to obtain value image by VI-module from reward image ?
- A: Let us assume batch size is 128 under 8x8 grid world. Once we obtain the reward image with shape [128, 1, 8, 8], we do convolutional layer for q layers in VI module. The [3, 3] filter represents the transition probabilities. There is a set of 10 filters, each for generating a feature map in q layers. Each feature map corresponds to an "action". Note that this is larger than real available actions which is only 8. Then we do a channel-wise Max Pooling to obtain the value image with shape [128, 1, 8, 8]. Finally we stack this value image with reward image for a new VI iteration.

PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

Related tags

Overview

VIN: Value Iteration Networks

Key idea

Learned Reward Image and Its Value Images for each VI Iteration

Dependencies

Datasets

Running Experiment: Training

Grid world 8x8

Grid world 16x16

Grid world 28x28

Visualization with Visdom

Benchmarks

GPU: TITAN X

Performance: Test Accuracy

Speed with GPU

Frequently Asked Questions

References

Further Readings

Owner

Xingdong Zuo

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

SegNet-like Autoencoders in TensorFlow

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

mPose3D, a mmWave-based 3D human pose estimation model.

Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras

A simple implementation of Kalman filter in Multi Object Tracking

HyDiff: Hybrid Differential Software Analysis

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

AoT is a system for automatically generating off-target test harness by using build information.

Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

Event-forecasting - Event Forecasting Algorithms With Python

Codes for the ICCV'21 paper "FREE: Feature Refinement for Generalized Zero-Shot Learning"

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)