Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Last update: Nov 19, 2022

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse g_t and its coordinates l_t. Given g_t and l_t, the feed-forward module F extracts features f_t, and the recurrent module R updates a hidden state to h_t. Using an updated hidden state h_t, the linear classifier C predicts the class distribution p(y|h_t). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,h_t) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,h_t)||p(y|h_t)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,h_t) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|h_t). The decoder D uses a sample z~q(z|h_t) to synthesize a feature map f^~ containing features of all glimpses. The model uses f^~(l) as features of a glimpse at location l and evaluates p(y|g,l,h_t)=p(y|f^~(l),h_t). Dashed arrows show a path to compute the lookahead class distribution p(y|f^~(l),h_t).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

SVHN (Let PyTorch download this dataset)
CIFAR-10 (Let PyTorch download this dataset)
CIFAR-100 (Let PyTorch download this dataset)
CINIC-10 (download from: https://datashare.is.ed.ac.uk/bitstream/handle/10283/3192/CINIC-10.tar.gz, visit https://github.com/BayesWatch/cinic-10)
TinyImageNet (download from: http://cs231n.stanford.edu/tiny-imagenet-200.zip)

Training a model

Use main.py to train and evaluate the model.

Arguments

dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
datapath: path to the downloaded datasets
lr: learning rate
training_phase: one of 'first', 'second', 'third'
ccebal: coefficient for cross entropy loss
batch: batch-size for training
batchv: batch-size for evaluation
T: maximum time-step
logfolder: path to log directory
epochs: number of training epochs
pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Related tags

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Requirements:

Datasets:

Training a model

Visualization of attention policy for a CIFAR-10 image

Acknowledgement

Owner

Automatically align face images 🙃→🙂. Can also do windowing and warping.

Python Auto-ML Package for Tabular Datasets

Neural Nano-Optics for High-quality Thin Lens Imaging

Pretraining on Dynamic Graph Neural Networks

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

unofficial pytorch implementation of RefineGAN

A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

CS506-Spring2022 - Code and Slides for Boston University CS 506

HuSpaCy: industrial-strength Hungarian natural language processing

load .txt to train YOLOX, same as Yolo others

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Motion .

Official repository for Hierarchical Opacity Propagation for Image Matting

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism