1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00

Updated tutorial and docs (#386)

Improved getting started tutorial, and updated docs to point to version 1.0.0
This commit is contained in:
Gal Novik
2019-08-05 16:46:15 +03:00
committed by GitHub
parent c1d1fae342
commit 92460736bc
12 changed files with 135 additions and 127 deletions

View File

@@ -30,26 +30,25 @@ coach -p CartPole_DQN -r
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
* [Release 0.12.0](https://github.com/NervanaSystems/coach/releases/tag/v0.12.0)
* Release 1.0.0 (current release)
* [Release 1.0.0](https://www.intel.ai/rl-coach-new-release) (current release)
Contacting the Coach development team is also possible over [email](mailto:coach@intel.com)
## Table of Contents
- [Coach](#coach)
* [Benchmarks](#benchmarks)
* [Installation](#installation)
* [Getting Started](#getting-started)
- [Benchmarks](#benchmarks)
- [Installation](#installation)
- [Getting Started](#getting-started)
* [Tutorials and Documentation](#tutorials-and-documentation)
* [Basic Usage](#basic-usage)
* [Running Coach](#running-coach)
* [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
* [Distributed Multi-Node Coach](#distributed-multi-node-coach)
* [Batch Reinforcement Learning](#batch-reinforcement-learning)
* [Supported Environments](#supported-environments)
* [Supported Algorithms](#supported-algorithms)
* [Citation](#citation)
* [Disclaimer](#disclaimer)
- [Supported Environments](#supported-environments)
- [Supported Algorithms](#supported-algorithms)
- [Citation](#citation)
- [Contact](#contact)
- [Disclaimer](#disclaimer)
## Benchmarks
@@ -289,7 +288,7 @@ There are [example](https://github.com/NervanaSystems/coach/blob/master/rl_coach
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
* [Twin Delayed Deep Deterministic Policy Gradient](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
* [Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
### General Agents
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
@@ -333,6 +332,15 @@ If you used Coach for your work, please use the following citation:
}
```
## Contact
We'd be happy to get any questions or contributions through GitHub issues and PRs.
Please make sure to take a look [here](CONTRIBUTING.md) before filing an issue or proposing a PR.
The Coach development team can also be contacted over [email](mailto:coach@intel.com)
## Disclaimer
Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product.

View File

@@ -27,7 +27,9 @@ Blog posts from the Intel® AI website:
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
* Release 0.12.0 (current release)
* `Release 0.12.0 <https://github.com/NervanaSystems/coach/releases/tag/v0.12.0>`_
* `Release 1.0.0 <https://www.intel.ai/rl-coach-new-release>`_ (current release)
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
@@ -75,5 +77,3 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
components/core_types
components/spaces
components/additional_parameters

View File

@@ -512,7 +512,7 @@ given observation</p>
<dl class="method">
<dt id="rl_coach.agents.agent.Agent.prepare_batch_for_inference">
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> &#x2192; Dict[str, numpy.core.multiarray.array]<a class="reference internal" href="../../_modules/rl_coach/agents/agent.html#Agent.prepare_batch_for_inference"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.agent.Agent.prepare_batch_for_inference" title="Permalink to this definition"></a></dt>
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> &#x2192; Dict[str, numpy.array]<a class="reference internal" href="../../_modules/rl_coach/agents/agent.html#Agent.prepare_batch_for_inference"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.agent.Agent.prepare_batch_for_inference" title="Permalink to this definition"></a></dt>
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
observations together, measurements together, etc.</p>
<dl class="field-list simple">

View File

@@ -95,6 +95,7 @@
<li class="toctree-l2 current"><a class="current reference internal" href="#">Algorithms</a></li>
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>

View File

@@ -37,7 +37,7 @@
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Selecting an Algorithm" href="../selecting_an_algorithm.html" />
<link rel="next" title="Batch Reinforcement Learning" href="batch_rl.html" />
<link rel="prev" title="Environments" href="environments.html" />
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">
@@ -95,6 +95,7 @@
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Benchmarks</a></li>
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
@@ -220,7 +221,7 @@ benchmarks stay intact as Coach continues to develop.</p>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../selecting_an_algorithm.html" class="btn btn-neutral float-right" title="Selecting an Algorithm" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="batch_rl.html" class="btn btn-neutral float-right" title="Batch Reinforcement Learning" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="environments.html" class="btn btn-neutral float-left" title="Environments" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>

View File

@@ -95,6 +95,7 @@
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Environments</a></li>
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>

View File

@@ -198,7 +198,8 @@ Coach collects statistics from the training process and supports advanced visual
<li><p><a class="reference external" href="https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/">Release 0.9.0</a></p></li>
<li><p><a class="reference external" href="https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)">Release 0.10.0</a></p></li>
<li><p><a class="reference external" href="https://ai.intel.com/rl-coach-data-science-at-scale/">Release 0.11.0</a></p></li>
<li><p>Release 0.12.0 (current release)</p></li>
<li><p><a class="reference external" href="https://github.com/NervanaSystems/coach/releases/tag/v0.12.0">Release 0.12.0</a></p></li>
<li><p><a class="reference external" href="https://www.intel.ai/rl-coach-new-release">Release 1.0.0</a> (current release)</p></li>
</ul>
<p>You can find more details in the <a class="reference external" href="https://github.com/NervanaSystems/coach">GitHub repository</a>.</p>
<div class="toctree-wrapper compound">

File diff suppressed because one or more lines are too long

View File

@@ -38,7 +38,7 @@
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Coach Dashboard" href="dashboard.html" />
<link rel="prev" title="Benchmarks" href="features/benchmarks.html" />
<link rel="prev" title="Batch Reinforcement Learning" href="features/batch_rl.html" />
<link href="_static/css/custom.css" rel="stylesheet" type="text/css">
</head>
@@ -475,7 +475,7 @@ algorithms for imitation learning in Coach.</p>
<a href="dashboard.html" class="btn btn-neutral float-right" title="Coach Dashboard" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="features/benchmarks.html" class="btn btn-neutral float-left" title="Benchmarks" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
<a href="features/batch_rl.html" class="btn btn-neutral float-left" title="Batch Reinforcement Learning" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>

View File

@@ -439,7 +439,7 @@ given observation</p>
<dl class="method">
<dt id="rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference">
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> &#x2192; Dict[str, numpy.core.multiarray.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition"></a></dt>
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> &#x2192; Dict[str, numpy.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition"></a></dt>
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
observations together, measurements together, etc.</p>
<dl class="field-list simple">

View File

@@ -27,7 +27,9 @@ Blog posts from the Intel® AI website:
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
* Release 0.12.0 (current release)
* `Release 0.12.0 <https://github.com/NervanaSystems/coach/releases/tag/v0.12.0>`_
* `Release 1.0.0 <https://www.intel.ai/rl-coach-new-release>`_ (current release)
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
@@ -75,5 +77,3 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
components/core_types
components/spaces
components/additional_parameters

View File

@@ -7,6 +7,21 @@
"# Getting Started Guide"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"- [Using Coach from the Command Line](#Using-Coach-from-the-Command-Line)\n",
"- [Using Coach as a Library](#Using-Coach-as-a-Library)\n",
" - [Preset based - using `CoachInterface`](#Preset-based---using-CoachInterface)\n",
" - [Training a preset](#Training-a-preset)\n",
" - [Running each training or inference iteration manually](#Running-each-training-or-inference-iteration-manually)\n",
" - [Non-preset - using `GraphManager` directly](#Non-preset---using-GraphManager-directly)\n",
" - [Training an agent with a custom Gym environment](#Training-an-agent-with-a-custom-Gym-environment)\n",
" - [Advanced functionality - proprietary exploration policy, checkpoint evaluation](#Advanced-functionality---proprietary-exploration-policy,-checkpoint-evaluation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -54,11 +69,7 @@
"source": [
"Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
"\n",
"Running Coach directly from python is done through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
"\n",
"Let's start with some examples.\n",
"\n",
"Creating a very simple graph containing a single Clipped PPO agent running with the CartPole-v0 Gym environment:"
"**Note: Each one of the examples in this section is independent, so notebook kernels need to be restarted before running it. Make sure you run the next cell before running any of the examples.**"
]
},
{
@@ -75,7 +86,28 @@
"if module_path not in sys.path:\n",
" sys.path.append(module_path)\n",
"if resources_path not in sys.path:\n",
" sys.path.append(resources_path)"
" sys.path.append(resources_path)\n",
" \n",
"from rl_coach.coach import CoachInterface"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preset based - using `CoachInterface`\n",
"\n",
"The basic method to run Coach directly from python is through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
"\n",
"Let's start with some examples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Training a preset\n",
"In this example, we'll create a very simple graph containing a Clipped PPO agent running with the CartPole-v0 Gym environment. `CoachInterface` has a few useful parameters such as `custom_parameter` that enables overriding preset settings, and other optional parameters enabling control over the training process. We'll override the preset's schedule parameters, train with a single rollout worker, and save checkpoints every 10 seconds:"
]
},
{
@@ -84,17 +116,11 @@
"metadata": {},
"outputs": [],
"source": [
"from rl_coach.coach import CoachInterface\n",
"\n",
"coach = CoachInterface(preset='CartPole_ClippedPPO',\n",
" custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Running the graph according to the given schedule:"
" # The optional custom_parameter enables overriding preset settings\n",
" custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)',\n",
" # Other optional parameters enable easy access to advanced functionalities\n",
" num_workers=1, checkpoint_save_secs=10)"
]
},
{
@@ -110,7 +136,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running each phase manually"
"#### Running each training or inference iteration manually"
]
},
{
@@ -120,57 +146,6 @@
"The graph manager (which was instantiated in the preset) can be accessed from the `CoachInterface` object. The graph manager simplifies the scheduling process by encapsulating the calls to each of the training phases. Sometimes, it can be beneficial to have a more fine grained control over the scheduling process. This can be easily done by calling the individual phase functions directly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from rl_coach.core_types import EnvironmentSteps\n",
"\n",
"coach.graph_manager.heatup(EnvironmentSteps(100))\n",
"for _ in range(10):\n",
" coach.graph_manager.train_and_act(EnvironmentSteps(50))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional functionality"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`CoachInterface` allows for easy access to functionalities such as multi-threading and saving checkpoints:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coach = CoachInterface(preset='CartPole_ClippedPPO', num_workers=2, checkpoint_save_secs=10)\n",
"coach.run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Agent functionality"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When using `CoachInterface` (single agent with one level of hierarchy) it's also possible to easily use the `Agent` object functionality, such as logging and reading signals and applying the policy the agent has learned on a given state:"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -183,13 +158,31 @@
"\n",
"coach = CoachInterface(preset='CartPole_ClippedPPO')\n",
"\n",
"# registering an iteration signal before starting to run\n",
"coach.graph_manager.log_signal('iteration', -1)\n",
"\n",
"coach.graph_manager.heatup(EnvironmentSteps(100))\n",
"\n",
"# training\n",
"for it in range(10):\n",
" # logging the iteration signal during training\n",
" coach.graph_manager.log_signal('iteration', it)\n",
" # using the graph manager to train and act a given number of steps\n",
" coach.graph_manager.train_and_act(EnvironmentSteps(100))\n",
" # reading signals during training\n",
" training_reward = coach.graph_manager.get_signal_value('Training Reward')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes we may want to track the agent's decisions, log or maybe even modify them.\n",
"We can access the agent itself through the `CoachInterface` as follows. \n",
"\n",
"Note that we also need an instance of the environment to do so. In this case we use instantiate a `GymEnvironment` object with the CartPole `GymVectorEnvironment`:"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -200,29 +193,41 @@
"env_params = GymVectorEnvironment(level='CartPole-v0')\n",
"env = GymEnvironment(**env_params.__dict__, visualization_parameters=VisualizationParameters())\n",
"\n",
"for it in range(10):\n",
" action_info = coach.graph_manager.get_agent().choose_action(env.state)\n",
" print(\"State:{}, Action:{}\".format(env.state,action_info.action))\n",
" env.step(action_info.action)"
"response = env.reset_internal_state()\n",
"for _ in range(10):\n",
" action_info = coach.graph_manager.get_agent().choose_action(response.next_state)\n",
" print(\"State:{}, Action:{}\".format(response.next_state,action_info.action))\n",
" response = env.step(action_info.action)\n",
" print(\"Reward:{}\".format(response.reward))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using GraphManager Directly"
"### Non-preset - using `GraphManager` directly"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file.\n",
"It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Training an agent with a custom Gym environment\n",
"\n",
"Here we show an example of how to do so with a custom environment.\n",
"We can use a custom gym environment without registering it. \n",
"We just need the path to the environment module.\n",
"We can also pass custom parameters for the environment `__init__` function as `additional_simulator_parameters`."
"Here we show an example of how to use the `GraphManager` to train an agent on a custom Gym environment.\n",
"\n",
"We first construct a `GymEnvironmentParameters` object describing the environment parameters. For Gym environments with vector observations, we can use the more specific `GymVectorEnvironment` object. \n",
"\n",
"The path to the custom environment is defined in the `level` parameter and it can be the absolute path to its class (e.g. `'/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass'`) or the relative path to the module as in this example. In any case, we can use the custom gym environment without registering it.\n",
"\n",
"Custom parameters for the environment's `__init__` function can be passed as `additional_simulator_parameters`."
]
},
{
@@ -269,23 +274,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The path to the environment can also be set as an absolute path, as follows: `<absolute python module path>:<environment class>`. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"env_params = GymVectorEnvironment(level='/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
"#### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
]
},
{
@@ -416,6 +405,13 @@
"# Clearning up\n",
"shutil.rmtree(my_checkpoint_dir)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {