Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Last update: Feb 10, 2022

Overview

Obstacle Tower Challenge using Deep Reinforcement Learning

Unity Obstacle Tower is a challenging realistic 3D, third person perspective and procedurally generated environment, which we use as the benchmark to test the performance of our deep RL models.

Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C), both combined with Intrinsic Curiosity Model (ICM), are implemented. We train our agent using both PPO and A2C while calculating several metrics such as loss function and reward to evaluate these two methods.

More details regarding this project can be viewed in our paper.

Dependencies

Requirements: see requirements.txt. We also require python 3.5+ and pip.

Install dependencies with pip install -r requirements.txt.

Download the environment at Obstacle Tower Github page here, unzip it and place it into the root of the project.

Training an agent

Run

Run python3 -m agent.learn. This will initialize training with default parameters and it will use A2C as a learning method.

Run python3 -m agent.learn -h to check all options and their description.

Visualization

Training can be visualized with TensorboardX.
After initial collection of observation stops and training starts, a log file that recording the data during the training process will be created in the runs/ file.

Run tensorboard --logdir runs/. This will start server on localhost:6006 by default.

For example, you can run tensorboard --logdir runs/a2c to see the visualization results of a training process using the given a2c model.

Here is a screenshot of the visualization example:

Running the agent

Run python3 -m runner --model_name This will start inference mode with selected model.

Two pretrained sample models are placed in models/ directory. To check all available options run python3 -m runner -h.
Agent runs until in-game time runs up.

For example, you can run python -m runner --model_name model_a2c_750.bin using the given sample model.

Here is a screenshot of a single run:

Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Related tags

Overview

Obstacle Tower Challenge using Deep Reinforcement Learning

Dependencies

Training an agent

Run

Visualization

Running the agent

Owner

Zhuoyu Feng

Gym environments used in the paper: "Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors"

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Pyramid addon for OpenAPI3 validation of requests and responses.

A PyTorch implementation of Implicit Q-Learning

MLJetReconstruction - using machine learning to reconstruct jets for CMS

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

My take on a practical implementation of Linformer for Pytorch.

LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

A bare-bones Python library for quality diversity optimization.

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Code for one-stage adaptive set-based HOI detector AS-Net.

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

Revisting Open World Object Detection

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Related tags

Overview

Obstacle Tower Challenge using Deep Reinforcement Learning

Dependencies

Training an agent

Run

Visualization

Running the agent

Owner

Zhuoyu Feng

Gym environments used in the paper: "Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors"

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Pyramid addon for OpenAPI3 validation of requests and responses.

A PyTorch implementation of Implicit Q-Learning

MLJetReconstruction - using machine learning to reconstruct jets for CMS

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

My take on a practical implementation of Linformer for Pytorch.

LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

A bare-bones Python library for quality diversity optimization.

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Code for one-stage adaptive set-based HOI detector AS-Net.

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

Revisting Open World Object Detection

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI