Deep reinforcement learning library built on top of Neural Network Libraries

Last update: Dec 14, 2022

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]

# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title	Notebook	Target RL task
Simple reinforcement learning training to get started		Pendulum
Learn how to use training algorithms		Pendulum
Learn how to use customized network model for training		Mountain car
Learn how to use different network solver for training		Pendulum
Learn how to use different replay buffer for training		Pendulum
Learn how to use your own environment for training		Customized environment
Atari game training example		Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments

Update cem function interface

Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

opened by sbsekiguchi 1
Add implementation for RNN support and DRQN algorithm
Add RNN model support and DRQN algorithm.

Following trainers will support RNN-model.

Q value-based trainers

Deterministic gradient and Soft policy trainers

Other trainers can support RNN models in future but is not implemented in the initial release.

See this paper for the details of the DRQN algorithm.
opened by ishihara-y 1

Implement SACD

This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

These changes have been made:

New environments with factored reward functions have been added
- FactoredLunarLanderContinuousV2NNablaRL-v1
- FactoredAntV4NNablaRL-v1
- FactoredHopperV4NNablaRL-v1
- FactoredHalfCheetahV4NNablaRL-v1
- FactoredWalker2dV4NNablaRL-v1
- FactoredHumanoidV4NNablaRL-v1
SACD algorithms has been added
SoftQDTrainer has been added
_InfluenceMetricsEvaluator has been added
reproduction script has been added (not benchmarked yet)

visualizing influence metrics

import gym

import numpy as np
import matplotlib.pyplot as plt

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")

evaluation_hook = H.EvaluationHook(
    eval_env,
    EpisodicEvaluator(run_per_evaluation=10),
    timing=5000,
    writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
)
iteration_num_hook = H.IterationNumHook(timing=100)

config = A.SACDConfig(gpu_id=0, reward_dimension=9)
sacd = A.SACD(env, config=config)
sacd.set_hooks([iteration_num_hook, evaluation_hook])
sacd.train_online(env, total_iterations=100000)

influence_history = []

state = env.reset()
while True:
    action = sacd.compute_eval_action(state)
    influence = sacd.compute_influence_metrics(state, action)
    influence_history.append(influence)
    state, _, done, _ = env.step(action)
    if done:
        break

influence_history = np.array(influence_history)
for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
    plt.plot(influence_history[:, i], label=label)
plt.xlabel("step")
plt.ylabel("influence metrics")
plt.legend()
plt.show()

sample animation

sample

opened by ishihara-y 0

Add gmm and Update gaussian

Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

The API change is like following:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
distribution = D.Gaussian(mean, ln_var)
# return nn.Variable
assert isinstance(distribution.sample(), nn.Variable)

Updated:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
# You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
assert isinstance(distribution.sample(), nn.Variable)

# If you pass np.ndarray, then all class methods return np.ndarray
# Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
assert isinstance(distribution.sample(), np.ndarray)

opened by sbsekiguchi 0

Support nnabla-browser

[x] add MonitorWriter
[x] save computational graph as nntxt

example

import gym

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

# save training computational graph
training_graph_hook = H.TrainingGraphHook(outdir="test")

# evaluation hook with nnabla's Monitor
eval_env = gym.make("Pendulum-v0")
evaluator = EpisodicEvaluator(run_per_evaluation=10)
evaluation_hook = H.EvaluationHook(
    eval_env,
    evaluator,
    timing=10,
    writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
)

env = gym.make("Pendulum-v0")
sac = A.SAC(env)
sac.set_hooks([training_graph_hook, evaluation_hook])

sac.train_online(env, total_iterations=100)

opened by ishihara-y 0

Add iLQR and LQR

Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

opened by ishihara-y 0
Check np_random instance and use correct randint alternative
I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

# in case of RandomState # this line works gym.unwrapped.np_random.rand_int(...) # in case of Generator # rand_int does not exist and we must use integers as an alternative gym.unwrapped.np_random.integers(...)

This PR will fix this issue and chooses correct function for sampling integers.
opened by ishihara-y 0
Add icra2018 qtopt

Add QtOpt algorithm proposed by Deirdre Quillen et al. in the paper Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods.

opened by sbsekiguchi 0

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)
special notes

This version does NOT support the version v0.26.0 and greater of openai gym.

We're going to support openai gym version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of openai gym from the next release.

Only support python 3.7 or greater

Python 3.6 is not supported from this new release

release-note-bugfix

Fix algos. Properly apply grad clip and weight decay

Correct variable to use during rnn training

Check np_random instance and use correct randint alternative

Fix pendulum-env render

Fix ScreenRenderEnv to support gym 0.25.0

release-note-algorithm

Run PPO on single process when actor num is 1

Add qrsac algorithm

Add REDQ algorithm

Update to support discrete tuple

Add icra2018 qtopt

Add goal_env module

Add PPO tuple state support

Add iLQR and LQR

Add mppi

Add ddp

release-note-distributions

Add gmm and Update gaussian

release-note-utility

Support nnabla-browser

release-note-docs

Fix module path of sac

Improve README with graph visulization feature with nnabla-browser

release-note-build

Extend github build timelimit to 5 minutes

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 17, 2022)
release-note-bugfix

Fix readme of reproduction

Fix cem test

Fix README samples and add prerequisites for Atari reproduction codes

Fix tutorial-model

Fix add workaround to avoid gym error

release-note-algorithm

Add ATRPO

Add implementation for RNN support and DRQN algorithm, Support RNN models on DQN and DQN inherited algorithms, Follow DRQN author's implementation and update results

Expand RNN support to dist rl algorithms

Add rnn support to actor critic algorithms

Support n-step q learning in ddpg, td3, her, sac and ICML2018SAC

Stop back propagating to target v function

Add MME-SAC algorithm and Sparse/Delayed mujoco environment and Add Disentangled version of MME-SAC

release-note-functions

Add stop gradient function

Add random shooting

Update cem function interface

release-note-distributions

Add Bernoulli distribution

Enable sampling from multidimensional logits

Add one hot softmax

release-note-utility

Support batched states for evaluation

Add convenient episode result env

Add profile function

release-note-docs

Update version in algorithm catalog

Add readthedocs yaml and Fixed yaml file

Add HER and IQN to algorithm catalog

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 20, 2021)
release-note-bugfix

Fix interactive-demos used in colab and Fix interactive-demos used in colab about gpu id

release-note-algorithm

Add HER

Add Rainbow

Fix algorithm reproduction directory path

Add rank-based prioritized replay

Add Double Dqn

Move algorithms reproduction dir to reproductions/algorithms

Enable injecting explorer to algorithm

Support multi-step Q learning

Add Categorical Double Dqn

Add c51 all atari game results

Support Tuple State and Update compute_v_target_and_advantage to support tuple state

release-note-parametric_functions

Add spatial_softmax function and Add spatial softmax docs

Add noisy net

release-note-functions

Add batch_flatten function

Add triangular_matrix function

release-note-utility

Fix load_snapshot

release-note-docs

Fix docs typo

Fix typo in readme

Display correct version

Fix numpy array typing to np.ndarray

Add function docs

Fix docstring of algorithms

Update NNablaRL to nnablaRL

Fix typo seemless -> seamless

Fix build badge URL

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 14, 2021)
We are happy to announce the release of nnablaRL, a deep reinforcement learning (RL) library built on top of nnabla. Reinforcement learning is one of the cutting edge machine learning technology that achieves super human performance in the field of gaming, robotics, etc.. We hope that this new library, nnablaRL, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

Friendly API

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") # 1 dqn = A.DQN(env) # 2 dqn.train(env) # 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") config = A.DQNConfig(batch_size=100) dqn = A.DQN(env, config=config) dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see sample codes and API documents.

Many builtin algorithms

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations. Please check the sample codes and document for detail usage of each algorithm. You can find the list of implemented algorithms here.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl import nnabla_rl.algorithms as A simulator = get_simulator() # This is just an example. Assuming that simulator exists dqn = A.DQN(simulator, config=config) dqn.train_online(simulator) real_data = get_real_data() # This is also an example. Assuming that you have real robot data dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks here.

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the contribution guide for details.
Source code(tar.gz)
Source code(zip)

Owner

Sony

Sony Group Corporation

GitHub Repository

Deep reinforcement learning library built on top of Neural Network Libraries

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

Installation

Enabling GPU accelaration (Optional)

Features

Friendly API

Many builtin algorithms

Seemless switching of online and offline training

Getting started

Documentation

Contribution guide

License

Comments

visualizing influence metrics

sample animation

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)

v0.11.0(Mar 17, 2022)

v0.10.0(Oct 20, 2021)

v0.9.0(Jun 14, 2021)

Getting started

Owner

Sony

This asynchronous telegram bot sells books.

The Best Multipurpose Discord Bot!

Discord-Bot - Bot using nextcord for beginners

Resources for the AMLD 2022 workshop "DevOps on AWS"

Petit webhook manager by moi (wassim)

Multi-Branch CI/CD Pipeline using CDK Pipelines.

A Pythonic wrapper for the Wikipedia API

Automate and Manage Telegram Channels

Discord Custom Playing Status Redirecting

A fork of discord.py

Python + AWS Lambda Hands OnPython + AWS Lambda Hands On

One of the best Telegram renamer bot with many new features

A Discord token grabber written in Python3, with awesome obfuscation and anti-debug protection.

WhatsApp Multi Device Client

BLYRIC is a Twitter bot that tweets a song lyric every night.

Minimal API for the COVID Booking System of the Offices at the UniPD Math Dep

3X Fast Telethon Based Bot

This repo contains a simple library for work with Eitaa messenger's api

This discord bot preview user 42intra login picture.

a simple quant trading bot with CLI interface