mirror of
https://github.com/gryf/coach.git
synced 2025-12-18 11:40:18 +01:00
pre-release 0.10.0
This commit is contained in:
212
README.md
212
README.md
@@ -1,10 +1,10 @@
|
||||
# Coach
|
||||
|
||||
[](https://github.com/NervanaSystems/coach/blob/master/LICENSE)
|
||||
[](http://NervanaSystems.github.io/coach/)
|
||||
[](https://nervanasystems.github.io/coach/)
|
||||
[](https://doi.org/10.5281/zenodo.1134898)
|
||||
|
||||
## Overview
|
||||
<p align="center"><img src="img/coach_logo.png" alt="Coach Logo" width="200"/></p>
|
||||
|
||||
Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.
|
||||
|
||||
@@ -36,7 +36,6 @@ Contacting the Coach development team is also possible through the email [coach@
|
||||
* [Usage](#usage)
|
||||
+ [Running Coach](#running-coach)
|
||||
+ [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
|
||||
+ [Parallelizing an Algorithm](#parallelizing-an-algorithm)
|
||||
* [Supported Environments](#supported-environments)
|
||||
* [Supported Algorithms](#supported-algorithms)
|
||||
* [Citation](#citation)
|
||||
@@ -44,56 +43,69 @@ Contacting the Coach development team is also possible through the email [coach@
|
||||
|
||||
## Documentation
|
||||
|
||||
Framework documentation, algorithm description and instructions on how to contribute a new agent/environment can be found [here](http://NervanaSystems.github.io/coach/).
|
||||
Framework documentation, algorithm description and instructions on how to contribute a new agent/environment can be found [here](https://nervanasystems.github.io/coach/).
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
|
||||
|
||||
### Coach Installer
|
||||
For some information on installing on Ubuntu 17.10 with Python 3.6.3, please refer to the following issue: https://github.com/NervanaSystems/coach/issues/54
|
||||
|
||||
Coach's installer will setup all the basics needed to get the user going with running Coach on top of [OpenAI Gym](https://github.com/openai/gym) environments. This can be done by running the following command and then following the on-screen printed instructions:
|
||||
In order to install coach, there are a few prerequisites required. This will setup all the basics needed to get the user going with running Coach on top of [OpenAI Gym](https://github.com/openai/gym) environments:
|
||||
|
||||
```bash
|
||||
./install.sh
|
||||
```
|
||||
# General
|
||||
sudo -E apt-get install python3-pip cmake zlib1g-dev python3-tk python-opencv -y
|
||||
|
||||
# Boost libraries
|
||||
sudo -E apt-get install libboost-all-dev -y
|
||||
|
||||
# Scipy requirements
|
||||
sudo -E apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran -y
|
||||
|
||||
# PyGame
|
||||
sudo -E apt-get install libsdl-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev
|
||||
libsmpeg-dev libportmidi-dev libavformat-dev libswscale-dev -y
|
||||
|
||||
# Dashboard
|
||||
sudo -E apt-get install dpkg-dev build-essential python3.5-dev libjpeg-dev libtiff-dev libsdl1.2-dev libnotify-dev
|
||||
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev libwebkitgtk-3.0-dev
|
||||
libgstreamer-plugins-base1.0-dev -y
|
||||
|
||||
# Gym
|
||||
sudo -E apt-get install libav-tools libsdl2-dev swig cmake -y
|
||||
```
|
||||
|
||||
Coach creates a virtual environment and installs in it to avoid changes to the user's system.
|
||||
We recommend installing coach in a virtualenv:
|
||||
|
||||
In order to activate and deactivate Coach's virtual environment:
|
||||
|
||||
```bash
|
||||
source coach_env/bin/activate
|
||||
```
|
||||
sudo -E pip3 install virtualenv
|
||||
virtualenv -p python3 coach_env
|
||||
. coach_env/bin/activate
|
||||
```
|
||||
|
||||
```bash
|
||||
deactivate
|
||||
Finally, install coach using pip:
|
||||
```
|
||||
pip3 install rl_coach
|
||||
```
|
||||
|
||||
Or alternatively, for a development environment, install coach from the cloned repository:
|
||||
```
|
||||
cd coach
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
If a GPU is present, Coach's pip package will install tensorflow-gpu, by default. If a GPU is not present, an [Intel-Optimized TensorFlow](https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available), will be installed.
|
||||
|
||||
In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.
|
||||
|
||||
### TensorFlow GPU Support
|
||||
|
||||
Coach's installer installs [Intel-Optimized TensorFlow](https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available), which does not support GPU, by default. In order to have Coach running with GPU, a GPU supported TensorFlow version must be installed. This can be done by overriding the TensorFlow version:
|
||||
|
||||
```bash
|
||||
pip3 install tensorflow-gpu
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Running Coach
|
||||
|
||||
Coach supports both TensorFlow and neon deep learning frameworks.
|
||||
|
||||
Switching between TensorFlow and neon backends is possible by using the `-f` flag.
|
||||
|
||||
Using TensorFlow (default): `-f tensorflow`
|
||||
|
||||
Using neon: `-f neon`
|
||||
|
||||
There are several available presets in presets.py.
|
||||
To allow reproducing results in Coach, we defined a mechanism called _preset_.
|
||||
There are several available presets under the `presets` directory.
|
||||
To list all the available presets use the `-l` flag.
|
||||
|
||||
To run a preset, use:
|
||||
@@ -103,39 +115,44 @@ python3 coach.py -r -p <preset_name>
|
||||
```
|
||||
|
||||
For example:
|
||||
1. CartPole environment using Policy Gradients:
|
||||
* CartPole environment using Policy Gradients (PG):
|
||||
|
||||
```bash
|
||||
python3 coach.py -r -p CartPole_PG
|
||||
```
|
||||
|
||||
2. Pendulum using Clipped PPO:
|
||||
|
||||
* Basic level of Doom using Dueling network and Double DQN (DDQN) algorithm:
|
||||
|
||||
```bash
|
||||
python3 coach.py -r -p Pendulum_ClippedPPO -n 8
|
||||
python3 coach.py -r -p Doom_Basic_Dueling_DDQN
|
||||
```
|
||||
|
||||
3. MountainCar using A3C:
|
||||
Some presets apply to a group of environment levels, like the entire Atari or Mujoco suites for example.
|
||||
To use these presets, the requeseted level should be defined using the `-lvl` flag.
|
||||
|
||||
For example:
|
||||
|
||||
|
||||
* Pong using the Nerual Episodic Control (NEC) algorithm:
|
||||
|
||||
```bash
|
||||
python3 coach.py -r -p MountainCar_A3C -n 8
|
||||
python3 coach.py -r -p Atari_NEC -lvl pong
|
||||
```
|
||||
|
||||
4. Doom basic level using Dueling network and Double DQN algorithm:
|
||||
There are several types of agents that can benefit from running them in a distrbitued fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process.
|
||||
To specify the number of workers to run, use the `-n` flag.
|
||||
|
||||
For example:
|
||||
* Breakout using Asynchronous Advantage Actor-Critic (A3C) with 8 workers:
|
||||
|
||||
```bash
|
||||
python3 coach.py -r -p Doom_Basic_Dueling_DDQN
|
||||
python3 coach.py -r -p Atari_A3C -lvl breakout -n 8
|
||||
```
|
||||
|
||||
5. Doom health gathering level using Mixed Monte Carlo:
|
||||
|
||||
```bash
|
||||
python3 coach.py -r -p Doom_Health_MMC
|
||||
```
|
||||
|
||||
It is easy to create new presets for different levels or environments by following the same pattern as in presets.py
|
||||
|
||||
More usage examples can be found [here](http://NervanaSystems.github.io/coach/usage/index.html).
|
||||
More usage examples can be found [here](https://nervanasystems.github.io/coach/usage/index.html).
|
||||
|
||||
### Running Coach Dashboard (Visualization)
|
||||
Training an agent to solve an environment can be tricky, at times.
|
||||
@@ -152,36 +169,14 @@ python3 dashboard.py
|
||||
|
||||
|
||||
|
||||
<img src="img/dashboard.png" alt="Coach Design" style="width: 800px;"/>
|
||||
|
||||
|
||||
### Parallelizing an Algorithm
|
||||
|
||||
Since the introduction of [A3C](https://arxiv.org/abs/1602.01783) in 2016, many algorithms were shown to benefit from running multiple instances in parallel, on many CPU cores. So far, these algorithms include [A3C](https://arxiv.org/abs/1602.01783), [DDPG](https://arxiv.org/pdf/1704.03073.pdf), [PPO](https://arxiv.org/pdf/1707.06347.pdf), and [NAF](https://arxiv.org/pdf/1610.00633.pdf), and this is most probably only the begining.
|
||||
|
||||
Parallelizing an algorithm using Coach is straight-forward.
|
||||
|
||||
The following method of NetworkWrapper parallelizes an algorithm seamlessly:
|
||||
|
||||
```python
|
||||
network.train_and_sync_networks(current_states, targets)
|
||||
```
|
||||
|
||||
Once a parallelized run is started, the ```train_and_sync_networks``` API will apply gradients from each local worker's network to the main global network, allowing for parallel training to take place.
|
||||
|
||||
Then, it merely requires running Coach with the ``` -n``` flag and with the number of workers to run with. For instance, the following command will set 16 workers to work together to train a MuJoCo Hopper:
|
||||
|
||||
```bash
|
||||
python3 coach.py -p Hopper_A3C -n 16
|
||||
```
|
||||
|
||||
<img src="img/dashboard.gif" alt="Coach Design" style="width: 800px;"/>
|
||||
|
||||
|
||||
## Supported Environments
|
||||
|
||||
* *OpenAI Gym:*
|
||||
|
||||
Installed by default by Coach's installer.
|
||||
Installed by default by Coach's installer. The version used by Coach is 0.10.5.
|
||||
|
||||
* *ViZDoom:*
|
||||
|
||||
@@ -189,6 +184,7 @@ python3 coach.py -p Hopper_A3C -n 16
|
||||
|
||||
https://github.com/mwydmuch/ViZDoom
|
||||
|
||||
The version currently used by Coach is 1.1.4.
|
||||
Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory.
|
||||
|
||||
* *Roboschool:*
|
||||
@@ -211,7 +207,7 @@ python3 coach.py -p Hopper_A3C -n 16
|
||||
|
||||
* *CARLA:*
|
||||
|
||||
Download release 0.7 from the CARLA repository -
|
||||
Download release 0.8.4 from the CARLA repository -
|
||||
|
||||
https://github.com/carla-simulator/carla/releases
|
||||
|
||||
@@ -219,6 +215,22 @@ python3 coach.py -p Hopper_A3C -n 16
|
||||
|
||||
A simple CARLA settings file (```CarlaSettings.ini```) is supplied with Coach, and is located in the ```environments``` directory.
|
||||
|
||||
* *Starcraft:*
|
||||
|
||||
Follow the instructions described in the PySC2 repository -
|
||||
|
||||
https://github.com/deepmind/pysc2
|
||||
|
||||
The version used by Coach is 2.0.1
|
||||
|
||||
* *DeepMind Control Suite:*
|
||||
|
||||
Follow the instructions described in the DeepMind Control Suite repository -
|
||||
|
||||
https://github.com/deepmind/dm_control
|
||||
|
||||
The version used by Coach is 0.0.0
|
||||
|
||||
|
||||
## Supported Algorithms
|
||||
|
||||
@@ -227,25 +239,47 @@ python3 coach.py -p Hopper_A3C -n 16
|
||||
|
||||
|
||||
|
||||
|
||||
* [Deep Q Network (DQN)](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) ([code](agents/dqn_agent.py))
|
||||
* [Double Deep Q Network (DDQN)](https://arxiv.org/pdf/1509.06461.pdf) ([code](agents/ddqn_agent.py))
|
||||
### Value Optimization Agents
|
||||
* [Deep Q Network (DQN)](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) ([code](rl_coach/agents/dqn_agent.py))
|
||||
* [Double Deep Q Network (DDQN)](https://arxiv.org/pdf/1509.06461.pdf) ([code](rl_coach/agents/ddqn_agent.py))
|
||||
* [Dueling Q Network](https://arxiv.org/abs/1511.06581)
|
||||
* [Mixed Monte Carlo (MMC)](https://arxiv.org/abs/1703.01310) ([code](agents/mmc_agent.py))
|
||||
* [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860) ([code](agents/pal_agent.py))
|
||||
* [Categorical Deep Q Network (C51)](https://arxiv.org/abs/1707.06887) ([code](agents/categorical_dqn_agent.py))
|
||||
* [Quantile Regression Deep Q Network (QR-DQN)](https://arxiv.org/pdf/1710.10044v1.pdf) ([code](agents/qr_dqn_agent.py))
|
||||
* [Bootstrapped Deep Q Network](https://arxiv.org/abs/1602.04621) ([code](agents/bootstrapped_dqn_agent.py))
|
||||
* [N-Step Q Learning](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](agents/n_step_q_agent.py))
|
||||
* [Neural Episodic Control (NEC)](https://arxiv.org/abs/1703.01988) ([code](agents/nec_agent.py))
|
||||
* [Normalized Advantage Functions (NAF)](https://arxiv.org/abs/1603.00748.pdf) | **Distributed** ([code](agents/naf_agent.py))
|
||||
* [Policy Gradients (PG)](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | **Distributed** ([code](agents/policy_gradients_agent.py))
|
||||
* [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](agents/actor_critic_agent.py))
|
||||
* [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | **Distributed** ([code](agents/ddpg_agent.py))
|
||||
* [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) ([code](agents/ppo_agent.py))
|
||||
* [Clipped Proximal Policy Optimization](https://arxiv.org/pdf/1707.06347.pdf) | **Distributed** ([code](agents/clipped_ppo_agent.py))
|
||||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Distributed** ([code](agents/dfp_agent.py))
|
||||
* Behavioral Cloning (BC) ([code](agents/bc_agent.py))
|
||||
* [Mixed Monte Carlo (MMC)](https://arxiv.org/abs/1703.01310) ([code](rl_coach/agents/mmc_agent.py))
|
||||
* [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860) ([code](rl_coach/agents/pal_agent.py))
|
||||
* [Categorical Deep Q Network (C51)](https://arxiv.org/abs/1707.06887) ([code](rl_coach/agents/categorical_dqn_agent.py))
|
||||
* [Quantile Regression Deep Q Network (QR-DQN)](https://arxiv.org/pdf/1710.10044v1.pdf) ([code](rl_coach/agents/qr_dqn_agent.py))
|
||||
* [N-Step Q Learning](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](rl_coach/agents/n_step_q_agent.py))
|
||||
* [Neural Episodic Control (NEC)](https://arxiv.org/abs/1703.01988) ([code](rl_coach/agents/nec_agent.py))
|
||||
* [Normalized Advantage Functions (NAF)](https://arxiv.org/abs/1603.00748.pdf) | **Distributed** ([code](rl_coach/agents/naf_agent.py))
|
||||
|
||||
### Policy Optimization Agents
|
||||
* [Policy Gradients (PG)](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | **Distributed** ([code](rl_coach/agents/policy_gradients_agent.py))
|
||||
* [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](rl_coach/agents/actor_critic_agent.py))
|
||||
* [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | **Distributed** ([code](rl_coach/agents/ddpg_agent.py))
|
||||
* [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) ([code](rl_coach/agents/ppo_agent.py))
|
||||
* [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Distributed** ([code](rl_coach/agents/clipped_ppo_agent.py))
|
||||
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
|
||||
|
||||
### General Agents
|
||||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Distributed** ([code](rl_coach/agents/dfp_agent.py))
|
||||
|
||||
### Imitation Learning Agents
|
||||
* Behavioral Cloning (BC) ([code](rl_coach/agents/bc_agent.py))
|
||||
|
||||
### Hierarchical Reinforcement Learning Agents
|
||||
* [Hierarchical Actor Critic (HAC)](https://arxiv.org/abs/1712.00948.pdf) ([code](rl_coach/agents/ddpg_hac_agent.py))
|
||||
|
||||
### Memory Types
|
||||
* [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495.pdf) ([code](rl_coach/memories/episodic/episodic_hindsight_experience_replay.py))
|
||||
* [Prioritized Experience Replay (PER)](https://arxiv.org/abs/1511.05952) ([code](rl_coach/memories/non_episodic/prioritized_experience_replay.py))
|
||||
|
||||
### Exploration Techniques
|
||||
* E-Greedy ([code](rl_coach/exploration_policies/e_greedy.py))
|
||||
* Boltzmann ([code](rl_coach/exploration_policies/boltzmann.py))
|
||||
* Ornstein–Uhlenbeck process ([code](rl_coach/exploration_policies/ou_process.py))
|
||||
* Normal Noise ([code](rl_coach/exploration_policies/additive_noise.py))
|
||||
* Truncated Normal Noise ([code](rl_coach/exploration_policies/truncated_normal.py))
|
||||
* [Bootstrapped Deep Q Network](https://arxiv.org/abs/1602.04621) ([code](rl_coach/agents/bootstrapped_dqn_agent.py))
|
||||
* [UCB Exploration via Q-Ensembles (UCB)](https://arxiv.org/abs/1706.01502) ([code](rl_coach/exploration_policies/ucb.py))
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
Reference in New Issue
Block a user