Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}

Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Overview

Addressing Function Approximation Error in Actor-Critic Methods

Usage

Results

Bibtex

Owner

Scott Fujimoto

Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Unit-Convertor - Unit Convertor Built With Python

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

CVPR 2021

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

A self-supervised learning framework for audio-visual speech

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

Chinese Advertisement Board Identification(Pytorch)

The code for two papers: Feedback Transformer and Expire-Span.

NAACL2021 - COIL Contextualized Lexical Retriever

TransReID: Transformer-based Object Re-Identification

Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

A PyTorch Library for Accelerating 3D Deep Learning Research

DeepLearning Anomalies Detection with Bluetooth Sensor Data

Official PyTorch implementation of PS-KD

This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

Lightweight library to build and train neural networks in Theano

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.