Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Last update: Dec 31, 2022

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name	Type	Estimator	Paper	File
Q-Learning	Value-based / Off policy	TD	Watkins et al. Q-Learning. Machine Learning, 1992	q_learning.py
REINFORCE	Policy-based On policy	MC	Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000.	reinforce.py
DQN	Value-based / Off policy	TD	Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.	doing
A2C	Actor-Critic / On policy	n-step TD	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.	a2c.py
A3C	Actor-Critic / On policy	n-step TD	.Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016	a3c.py
ACER	Actor-Critic / On policy	GAE	Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017.	doing
ACKTR	Actor-Critic / On policy	GAE	Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017.	doing
PPO	Actor-Critic / On policy	GAE	Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017.	ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component	Description
policy	neural network model
gamma	discount factor of cumulative reward
lr	learning rate. i.e. `lr_actor`, `lr_critic`
lr_decay	weight decay to schedule the learning rate
lr_scheduler	scheduler for the learning rate
coef_critic_loss	coefficient of critic loss
coef_entropy_loss	coefficient of entropy loss
writer	summary writer to record information
buffer	replay buffer to store historical trajectories
use_cuda	use GPU
clip_grad	gradients clipping
max_grad_norm	maximum norm of gradients clipped
norm_advantage	advantage normalization
open_tb	open summary writer
open_tqdm	open process bar

Methods

Methods	Description
preprocess_obs()	preprocess observation before input into the neural network
select_action()	use actor network to select an action based on the policy distribution.
estimate_obs()	use critic network to estimate the value of observation
update()	update the parameter by calculate losses and gradients
train()	set the neural network to train mode
eval()	set the neural network to evaluate mode
save()	save the model parameters
load()	load the model parameters

Update & To-do & Limitations

Update History

2021-12-09 ADD TRICK:norm_critic_loss in PPO
2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07 ADD ALGO: A3C
2021-12-05 ADD ALGO: PPO
2021-11-28 ADD ALGO: A2C
2021-11-20 ADD ALGO: Q learning, Reinforce

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Related tags

Overview

PyTorch RL Minimal Implementations

RL Algorithms List

Quick Start

Requirements

Abstract Agent

Components / Parameters

Methods

Update & To-do & Limitations

Update History

To-do List

Current Limitations

Reference & Acknowledgements

Owner

Gemini Light

FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

E-RAFT: Dense Optical Flow from Event Cameras

Mesh TensorFlow: Model Parallelism Made Easier

LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Setup and customize deep learning environment in seconds.

A collection of semantic image segmentation models implemented in TensorFlow

Multi-Glimpse Network With Python

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

House3D: A Rich and Realistic 3D Environment

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Simply enable or disable your Nvidia dGPU

A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

Expressive Power of Invariant and Equivaraint Graph Neural Networks (ICLR 2021)