mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 11:10:20 +01:00
@@ -731,18 +731,19 @@ workflows:
|
|||||||
- functional_tests:
|
- functional_tests:
|
||||||
requires:
|
requires:
|
||||||
- build_base
|
- build_base
|
||||||
- functional_test_doom:
|
# - functional_test_doom:
|
||||||
requires:
|
# requires:
|
||||||
- build_doom_env
|
# - build_doom_env
|
||||||
- functional_tests
|
# - functional_tests
|
||||||
- functional_test_mujoco:
|
# - functional_test_mujoco:
|
||||||
requires:
|
# requires:
|
||||||
- build_mujoco_env
|
# - build_mujoco_env
|
||||||
- functional_test_doom
|
# - functional_test_doom
|
||||||
- golden_test_gym:
|
- golden_test_gym:
|
||||||
requires:
|
requires:
|
||||||
- build_gym_env
|
- build_gym_env
|
||||||
- functional_test_mujoco
|
# - functional_test_mujoco
|
||||||
|
- functional_tests
|
||||||
- golden_test_doom:
|
- golden_test_doom:
|
||||||
requires:
|
requires:
|
||||||
- build_doom_env
|
- build_doom_env
|
||||||
|
|||||||
@@ -54,7 +54,7 @@ Coach is released as two pypi packages:
|
|||||||
|
|
||||||
Each pypi package release has a GitHub release and tag with the same version number. The numbers are of the X.Y.Z format, where
|
Each pypi package release has a GitHub release and tag with the same version number. The numbers are of the X.Y.Z format, where
|
||||||
|
|
||||||
X - zero in the near future, may change when Coach is feature complete
|
X - currently one, will be incremented on major API changes
|
||||||
|
|
||||||
Y - major releases with new features
|
Y - major releases with new features
|
||||||
|
|
||||||
|
|||||||
59
README.md
59
README.md
@@ -29,20 +29,23 @@ coach -p CartPole_DQN -r
|
|||||||
* [Release 0.9.0](https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/)
|
* [Release 0.9.0](https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/)
|
||||||
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
|
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
|
||||||
* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
|
* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
|
||||||
* Release 0.12.0 (current release)
|
* [Release 0.12.0](https://github.com/NervanaSystems/coach/releases/tag/v0.12.0)
|
||||||
|
* Release 1.0.0 (current release)
|
||||||
|
|
||||||
Contacting the Coach development team is also possible through the email [coach@intel.com](coach@intel.com)
|
Contacting the Coach development team is also possible over [email](mailto:coach@intel.com)
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Coach](#coach)
|
- [Coach](#coach)
|
||||||
* [Overview](#overview)
|
|
||||||
* [Benchmarks](#benchmarks)
|
* [Benchmarks](#benchmarks)
|
||||||
* [Documentation](#documentation)
|
|
||||||
* [Installation](#installation)
|
* [Installation](#installation)
|
||||||
* [Usage](#usage)
|
* [Getting Started](#getting-started)
|
||||||
+ [Running Coach](#running-coach)
|
* [Tutorials and Documentation](#tutorials-and-documentation)
|
||||||
+ [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
|
* [Basic Usage](#basic-usage)
|
||||||
|
* [Running Coach](#running-coach)
|
||||||
|
* [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
|
||||||
|
* [Distributed Multi-Node Coach](#distributed-multi-node-coach)
|
||||||
|
* [Batch Reinforcement Learning](#batch-reinforcement-learning)
|
||||||
* [Supported Environments](#supported-environments)
|
* [Supported Environments](#supported-environments)
|
||||||
* [Supported Algorithms](#supported-algorithms)
|
* [Supported Algorithms](#supported-algorithms)
|
||||||
* [Citation](#citation)
|
* [Citation](#citation)
|
||||||
@@ -52,13 +55,6 @@ Contacting the Coach development team is also possible through the email [coach@
|
|||||||
|
|
||||||
One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors. To address this problem, we are releasing a set of [benchmarks](benchmarks) that shows Coach reliably reproduces many state of the art algorithm results.
|
One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors. To address this problem, we are releasing a set of [benchmarks](benchmarks) that shows Coach reliably reproduces many state of the art algorithm results.
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
Framework documentation, algorithm description and instructions on how to contribute a new agent/environment can be found [here](https://nervanasystems.github.io/coach/).
|
|
||||||
|
|
||||||
Jupyter notebooks demonstrating how to run Coach from command line or as a library, implement an algorithm, or integrate an environment can be found [here](https://github.com/NervanaSystems/coach/tree/master/tutorials).
|
|
||||||
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
|
Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
|
||||||
@@ -113,9 +109,16 @@ If a GPU is present, Coach's pip package will install tensorflow-gpu, by default
|
|||||||
|
|
||||||
In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.
|
In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.
|
||||||
|
|
||||||
## Usage
|
## Getting Started
|
||||||
|
|
||||||
### Running Coach
|
### Tutorials and Documentation
|
||||||
|
[Jupyter notebooks demonstrating how to run Coach from command line or as a library, implement an algorithm, or integrate an environment](https://github.com/NervanaSystems/coach/tree/master/tutorials).
|
||||||
|
|
||||||
|
[Framework documentation, algorithm description and instructions on how to contribute a new agent/environment](https://nervanasystems.github.io/coach/).
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
#### Running Coach
|
||||||
|
|
||||||
To allow reproducing results in Coach, we defined a mechanism called _preset_.
|
To allow reproducing results in Coach, we defined a mechanism called _preset_.
|
||||||
There are several available presets under the `presets` directory.
|
There are several available presets under the `presets` directory.
|
||||||
@@ -167,17 +170,7 @@ It is easy to create new presets for different levels or environments by followi
|
|||||||
|
|
||||||
More usage examples can be found [here](https://github.com/NervanaSystems/coach/blob/master/tutorials/0.%20Quick%20Start%20Guide.ipynb).
|
More usage examples can be found [here](https://github.com/NervanaSystems/coach/blob/master/tutorials/0.%20Quick%20Start%20Guide.ipynb).
|
||||||
|
|
||||||
### Distributed Multi-Node Coach
|
#### Running Coach Dashboard (Visualization)
|
||||||
|
|
||||||
As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents.
|
|
||||||
For usage instructions please refer to the documentation [here](https://nervanasystems.github.io/coach/dist_usage.html).
|
|
||||||
|
|
||||||
### Batch Reinforcement Learning
|
|
||||||
|
|
||||||
Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach.
|
|
||||||
There are [example](https://github.com/NervanaSystems/coach/blob/master/rl_coach/presets/CartPole_DDQN_BatchRL.py) [presets](https://github.com/NervanaSystems/coach/blob/master/rl_coach/presets/Acrobot_DDQN_BCQ_BatchRL.py) and a [tutorial](https://github.com/NervanaSystems/coach/blob/master/tutorials/4.%20Batch%20Reinforcement%20Learning.ipynb).
|
|
||||||
|
|
||||||
### Running Coach Dashboard (Visualization)
|
|
||||||
Training an agent to solve an environment can be tricky, at times.
|
Training an agent to solve an environment can be tricky, at times.
|
||||||
|
|
||||||
In order to debug the training process, Coach outputs several signals, per trained algorithm, in order to track algorithmic performance.
|
In order to debug the training process, Coach outputs several signals, per trained algorithm, in order to track algorithmic performance.
|
||||||
@@ -195,6 +188,17 @@ dashboard
|
|||||||
<img src="img/dashboard.gif" alt="Coach Design" style="width: 800px;"/>
|
<img src="img/dashboard.gif" alt="Coach Design" style="width: 800px;"/>
|
||||||
|
|
||||||
|
|
||||||
|
### Distributed Multi-Node Coach
|
||||||
|
|
||||||
|
As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents.
|
||||||
|
For usage instructions please refer to the documentation [here](https://nervanasystems.github.io/coach/dist_usage.html).
|
||||||
|
|
||||||
|
### Batch Reinforcement Learning
|
||||||
|
|
||||||
|
Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach.
|
||||||
|
There are [example](https://github.com/NervanaSystems/coach/blob/master/rl_coach/presets/CartPole_DDQN_BatchRL.py) [presets](https://github.com/NervanaSystems/coach/blob/master/rl_coach/presets/Acrobot_DDQN_BCQ_BatchRL.py) and a [tutorial](https://github.com/NervanaSystems/coach/blob/master/tutorials/4.%20Batch%20Reinforcement%20Learning.ipynb).
|
||||||
|
|
||||||
|
|
||||||
## Supported Environments
|
## Supported Environments
|
||||||
|
|
||||||
* *OpenAI Gym:*
|
* *OpenAI Gym:*
|
||||||
@@ -285,6 +289,7 @@ dashboard
|
|||||||
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
|
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
|
||||||
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
|
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
|
||||||
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
|
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
|
||||||
|
* [Twin Delayed Deep Deterministic Policy Gradient](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
|
||||||
|
|
||||||
### General Agents
|
### General Agents
|
||||||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
|
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
|
||||||
|
|||||||
@@ -5,3 +5,5 @@ markers =
|
|||||||
integration_test: long test that checks that the complete framework is running correctly
|
integration_test: long test that checks that the complete framework is running correctly
|
||||||
filterwarnings =
|
filterwarnings =
|
||||||
ignore::DeprecationWarning
|
ignore::DeprecationWarning
|
||||||
|
norecursedirs =
|
||||||
|
*mxnet*
|
||||||
|
|||||||
2
setup.py
2
setup.py
@@ -85,7 +85,7 @@ extras['all'] = all_deps
|
|||||||
|
|
||||||
setup(
|
setup(
|
||||||
name='rl-coach' if not slim_package else 'rl-coach-slim',
|
name='rl-coach' if not slim_package else 'rl-coach-slim',
|
||||||
version='0.12.1',
|
version='1.0.0',
|
||||||
description='Reinforcement Learning Coach enables easy experimentation with state of the art Reinforcement Learning algorithms.',
|
description='Reinforcement Learning Coach enables easy experimentation with state of the art Reinforcement Learning algorithms.',
|
||||||
url='https://github.com/NervanaSystems/coach',
|
url='https://github.com/NervanaSystems/coach',
|
||||||
author='Intel AI Lab',
|
author='Intel AI Lab',
|
||||||
|
|||||||
Reference in New Issue
Block a user