mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 11:10:20 +01:00
Updated tutorial and docs (#386)
Improved getting started tutorial, and updated docs to point to version 1.0.0
This commit is contained in:
42
README.md
42
README.md
@@ -30,26 +30,25 @@ coach -p CartPole_DQN -r
|
|||||||
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
|
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
|
||||||
* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
|
* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
|
||||||
* [Release 0.12.0](https://github.com/NervanaSystems/coach/releases/tag/v0.12.0)
|
* [Release 0.12.0](https://github.com/NervanaSystems/coach/releases/tag/v0.12.0)
|
||||||
* Release 1.0.0 (current release)
|
* [Release 1.0.0](https://www.intel.ai/rl-coach-new-release) (current release)
|
||||||
|
|
||||||
Contacting the Coach development team is also possible over [email](mailto:coach@intel.com)
|
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Coach](#coach)
|
- [Benchmarks](#benchmarks)
|
||||||
* [Benchmarks](#benchmarks)
|
- [Installation](#installation)
|
||||||
* [Installation](#installation)
|
- [Getting Started](#getting-started)
|
||||||
* [Getting Started](#getting-started)
|
* [Tutorials and Documentation](#tutorials-and-documentation)
|
||||||
* [Tutorials and Documentation](#tutorials-and-documentation)
|
* [Basic Usage](#basic-usage)
|
||||||
* [Basic Usage](#basic-usage)
|
* [Running Coach](#running-coach)
|
||||||
* [Running Coach](#running-coach)
|
* [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
|
||||||
* [Running Coach Dashboard (Visualization)](#running-coach-dashboard-visualization)
|
* [Distributed Multi-Node Coach](#distributed-multi-node-coach)
|
||||||
* [Distributed Multi-Node Coach](#distributed-multi-node-coach)
|
* [Batch Reinforcement Learning](#batch-reinforcement-learning)
|
||||||
* [Batch Reinforcement Learning](#batch-reinforcement-learning)
|
- [Supported Environments](#supported-environments)
|
||||||
* [Supported Environments](#supported-environments)
|
- [Supported Algorithms](#supported-algorithms)
|
||||||
* [Supported Algorithms](#supported-algorithms)
|
- [Citation](#citation)
|
||||||
* [Citation](#citation)
|
- [Contact](#contact)
|
||||||
* [Disclaimer](#disclaimer)
|
- [Disclaimer](#disclaimer)
|
||||||
|
|
||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
|
||||||
@@ -289,7 +288,7 @@ There are [example](https://github.com/NervanaSystems/coach/blob/master/rl_coach
|
|||||||
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
|
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
|
||||||
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
|
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
|
||||||
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
|
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
|
||||||
* [Twin Delayed Deep Deterministic Policy Gradient](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
|
* [Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
|
||||||
|
|
||||||
### General Agents
|
### General Agents
|
||||||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
|
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
|
||||||
@@ -333,6 +332,15 @@ If you used Coach for your work, please use the following citation:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
|
||||||
|
We'd be happy to get any questions or contributions through GitHub issues and PRs.
|
||||||
|
|
||||||
|
Please make sure to take a look [here](CONTRIBUTING.md) before filing an issue or proposing a PR.
|
||||||
|
|
||||||
|
The Coach development team can also be contacted over [email](mailto:coach@intel.com)
|
||||||
|
|
||||||
|
|
||||||
## Disclaimer
|
## Disclaimer
|
||||||
|
|
||||||
Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product.
|
Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product.
|
||||||
|
|||||||
@@ -27,7 +27,9 @@ Blog posts from the Intel® AI website:
|
|||||||
|
|
||||||
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
|
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
|
||||||
|
|
||||||
* Release 0.12.0 (current release)
|
* `Release 0.12.0 <https://github.com/NervanaSystems/coach/releases/tag/v0.12.0>`_
|
||||||
|
|
||||||
|
* `Release 1.0.0 <https://www.intel.ai/rl-coach-new-release>`_ (current release)
|
||||||
|
|
||||||
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
|
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
|
||||||
|
|
||||||
@@ -75,5 +77,3 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
|
|||||||
components/core_types
|
components/core_types
|
||||||
components/spaces
|
components/spaces
|
||||||
components/additional_parameters
|
components/additional_parameters
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -512,7 +512,7 @@ given observation</p>
|
|||||||
|
|
||||||
<dl class="method">
|
<dl class="method">
|
||||||
<dt id="rl_coach.agents.agent.Agent.prepare_batch_for_inference">
|
<dt id="rl_coach.agents.agent.Agent.prepare_batch_for_inference">
|
||||||
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.core.multiarray.array]<a class="reference internal" href="../../_modules/rl_coach/agents/agent.html#Agent.prepare_batch_for_inference"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.agent.Agent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.array]<a class="reference internal" href="../../_modules/rl_coach/agents/agent.html#Agent.prepare_batch_for_inference"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.agent.Agent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||||
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
|
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
|
||||||
observations together, measurements together, etc.</p>
|
observations together, measurements together, etc.</p>
|
||||||
<dl class="field-list simple">
|
<dl class="field-list simple">
|
||||||
|
|||||||
@@ -95,6 +95,7 @@
|
|||||||
<li class="toctree-l2 current"><a class="current reference internal" href="#">Algorithms</a></li>
|
<li class="toctree-l2 current"><a class="current reference internal" href="#">Algorithms</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
|
||||||
|
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
||||||
|
|||||||
@@ -37,7 +37,7 @@
|
|||||||
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
|
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
|
||||||
<link rel="index" title="Index" href="../genindex.html" />
|
<link rel="index" title="Index" href="../genindex.html" />
|
||||||
<link rel="search" title="Search" href="../search.html" />
|
<link rel="search" title="Search" href="../search.html" />
|
||||||
<link rel="next" title="Selecting an Algorithm" href="../selecting_an_algorithm.html" />
|
<link rel="next" title="Batch Reinforcement Learning" href="batch_rl.html" />
|
||||||
<link rel="prev" title="Environments" href="environments.html" />
|
<link rel="prev" title="Environments" href="environments.html" />
|
||||||
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">
|
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">
|
||||||
|
|
||||||
@@ -95,6 +95,7 @@
|
|||||||
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="environments.html">Environments</a></li>
|
||||||
<li class="toctree-l2 current"><a class="current reference internal" href="#">Benchmarks</a></li>
|
<li class="toctree-l2 current"><a class="current reference internal" href="#">Benchmarks</a></li>
|
||||||
|
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
||||||
@@ -220,7 +221,7 @@ benchmarks stay intact as Coach continues to develop.</p>
|
|||||||
|
|
||||||
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
|
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
|
||||||
|
|
||||||
<a href="../selecting_an_algorithm.html" class="btn btn-neutral float-right" title="Selecting an Algorithm" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
|
<a href="batch_rl.html" class="btn btn-neutral float-right" title="Batch Reinforcement Learning" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
|
||||||
|
|
||||||
|
|
||||||
<a href="environments.html" class="btn btn-neutral float-left" title="Environments" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
|
<a href="environments.html" class="btn btn-neutral float-left" title="Environments" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
|
||||||
|
|||||||
@@ -95,6 +95,7 @@
|
|||||||
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="algorithms.html">Algorithms</a></li>
|
||||||
<li class="toctree-l2 current"><a class="current reference internal" href="#">Environments</a></li>
|
<li class="toctree-l2 current"><a class="current reference internal" href="#">Environments</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
|
||||||
|
<li class="toctree-l2"><a class="reference internal" href="batch_rl.html">Batch Reinforcement Learning</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
|
||||||
|
|||||||
@@ -198,7 +198,8 @@ Coach collects statistics from the training process and supports advanced visual
|
|||||||
<li><p><a class="reference external" href="https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/">Release 0.9.0</a></p></li>
|
<li><p><a class="reference external" href="https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/">Release 0.9.0</a></p></li>
|
||||||
<li><p><a class="reference external" href="https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)">Release 0.10.0</a></p></li>
|
<li><p><a class="reference external" href="https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)">Release 0.10.0</a></p></li>
|
||||||
<li><p><a class="reference external" href="https://ai.intel.com/rl-coach-data-science-at-scale/">Release 0.11.0</a></p></li>
|
<li><p><a class="reference external" href="https://ai.intel.com/rl-coach-data-science-at-scale/">Release 0.11.0</a></p></li>
|
||||||
<li><p>Release 0.12.0 (current release)</p></li>
|
<li><p><a class="reference external" href="https://github.com/NervanaSystems/coach/releases/tag/v0.12.0">Release 0.12.0</a></p></li>
|
||||||
|
<li><p><a class="reference external" href="https://www.intel.ai/rl-coach-new-release">Release 1.0.0</a> (current release)</p></li>
|
||||||
</ul>
|
</ul>
|
||||||
<p>You can find more details in the <a class="reference external" href="https://github.com/NervanaSystems/coach">GitHub repository</a>.</p>
|
<p>You can find more details in the <a class="reference external" href="https://github.com/NervanaSystems/coach">GitHub repository</a>.</p>
|
||||||
<div class="toctree-wrapper compound">
|
<div class="toctree-wrapper compound">
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -38,7 +38,7 @@
|
|||||||
<link rel="index" title="Index" href="genindex.html" />
|
<link rel="index" title="Index" href="genindex.html" />
|
||||||
<link rel="search" title="Search" href="search.html" />
|
<link rel="search" title="Search" href="search.html" />
|
||||||
<link rel="next" title="Coach Dashboard" href="dashboard.html" />
|
<link rel="next" title="Coach Dashboard" href="dashboard.html" />
|
||||||
<link rel="prev" title="Benchmarks" href="features/benchmarks.html" />
|
<link rel="prev" title="Batch Reinforcement Learning" href="features/batch_rl.html" />
|
||||||
<link href="_static/css/custom.css" rel="stylesheet" type="text/css">
|
<link href="_static/css/custom.css" rel="stylesheet" type="text/css">
|
||||||
|
|
||||||
</head>
|
</head>
|
||||||
@@ -475,7 +475,7 @@ algorithms for imitation learning in Coach.</p>
|
|||||||
<a href="dashboard.html" class="btn btn-neutral float-right" title="Coach Dashboard" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
|
<a href="dashboard.html" class="btn btn-neutral float-right" title="Coach Dashboard" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
|
||||||
|
|
||||||
|
|
||||||
<a href="features/benchmarks.html" class="btn btn-neutral float-left" title="Benchmarks" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
|
<a href="features/batch_rl.html" class="btn btn-neutral float-left" title="Batch Reinforcement Learning" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|||||||
@@ -439,7 +439,7 @@ given observation</p>
|
|||||||
|
|
||||||
<dl class="method">
|
<dl class="method">
|
||||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference">
|
<dt id="rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference">
|
||||||
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.core.multiarray.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||||
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
|
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
|
||||||
observations together, measurements together, etc.</p>
|
observations together, measurements together, etc.</p>
|
||||||
<dl class="field-list simple">
|
<dl class="field-list simple">
|
||||||
|
|||||||
@@ -27,7 +27,9 @@ Blog posts from the Intel® AI website:
|
|||||||
|
|
||||||
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
|
* `Release 0.11.0 <https://ai.intel.com/rl-coach-data-science-at-scale/>`_
|
||||||
|
|
||||||
* Release 0.12.0 (current release)
|
* `Release 0.12.0 <https://github.com/NervanaSystems/coach/releases/tag/v0.12.0>`_
|
||||||
|
|
||||||
|
* `Release 1.0.0 <https://www.intel.ai/rl-coach-new-release>`_ (current release)
|
||||||
|
|
||||||
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
|
You can find more details in the `GitHub repository <https://github.com/NervanaSystems/coach>`_.
|
||||||
|
|
||||||
@@ -75,5 +77,3 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
|
|||||||
components/core_types
|
components/core_types
|
||||||
components/spaces
|
components/spaces
|
||||||
components/additional_parameters
|
components/additional_parameters
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -7,6 +7,21 @@
|
|||||||
"# Getting Started Guide"
|
"# Getting Started Guide"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Table of Contents\n",
|
||||||
|
"- [Using Coach from the Command Line](#Using-Coach-from-the-Command-Line)\n",
|
||||||
|
"- [Using Coach as a Library](#Using-Coach-as-a-Library)\n",
|
||||||
|
" - [Preset based - using `CoachInterface`](#Preset-based---using-CoachInterface)\n",
|
||||||
|
" - [Training a preset](#Training-a-preset)\n",
|
||||||
|
" - [Running each training or inference iteration manually](#Running-each-training-or-inference-iteration-manually)\n",
|
||||||
|
" - [Non-preset - using `GraphManager` directly](#Non-preset---using-GraphManager-directly)\n",
|
||||||
|
" - [Training an agent with a custom Gym environment](#Training-an-agent-with-a-custom-Gym-environment)\n",
|
||||||
|
" - [Advanced functionality - proprietary exploration policy, checkpoint evaluation](#Advanced-functionality---proprietary-exploration-policy,-checkpoint-evaluation)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -54,11 +69,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
|
"Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Running Coach directly from python is done through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
|
"**Note: Each one of the examples in this section is independent, so notebook kernels need to be restarted before running it. Make sure you run the next cell before running any of the examples.**"
|
||||||
"\n",
|
|
||||||
"Let's start with some examples.\n",
|
|
||||||
"\n",
|
|
||||||
"Creating a very simple graph containing a single Clipped PPO agent running with the CartPole-v0 Gym environment:"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -75,7 +86,28 @@
|
|||||||
"if module_path not in sys.path:\n",
|
"if module_path not in sys.path:\n",
|
||||||
" sys.path.append(module_path)\n",
|
" sys.path.append(module_path)\n",
|
||||||
"if resources_path not in sys.path:\n",
|
"if resources_path not in sys.path:\n",
|
||||||
" sys.path.append(resources_path)"
|
" sys.path.append(resources_path)\n",
|
||||||
|
" \n",
|
||||||
|
"from rl_coach.coach import CoachInterface"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Preset based - using `CoachInterface`\n",
|
||||||
|
"\n",
|
||||||
|
"The basic method to run Coach directly from python is through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
|
||||||
|
"\n",
|
||||||
|
"Let's start with some examples."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Training a preset\n",
|
||||||
|
"In this example, we'll create a very simple graph containing a Clipped PPO agent running with the CartPole-v0 Gym environment. `CoachInterface` has a few useful parameters such as `custom_parameter` that enables overriding preset settings, and other optional parameters enabling control over the training process. We'll override the preset's schedule parameters, train with a single rollout worker, and save checkpoints every 10 seconds:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -84,17 +116,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from rl_coach.coach import CoachInterface\n",
|
|
||||||
"\n",
|
|
||||||
"coach = CoachInterface(preset='CartPole_ClippedPPO',\n",
|
"coach = CoachInterface(preset='CartPole_ClippedPPO',\n",
|
||||||
" custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)')"
|
" # The optional custom_parameter enables overriding preset settings\n",
|
||||||
]
|
" custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)',\n",
|
||||||
},
|
" # Other optional parameters enable easy access to advanced functionalities\n",
|
||||||
{
|
" num_workers=1, checkpoint_save_secs=10)"
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"Running the graph according to the given schedule:"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -110,7 +136,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Running each phase manually"
|
"#### Running each training or inference iteration manually"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -126,70 +152,37 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from rl_coach.core_types import EnvironmentSteps\n",
|
"from rl_coach.environments.gym_environment import GymEnvironment, GymVectorEnvironment\n",
|
||||||
"\n",
|
|
||||||
"coach.graph_manager.heatup(EnvironmentSteps(100))\n",
|
|
||||||
"for _ in range(10):\n",
|
|
||||||
" coach.graph_manager.train_and_act(EnvironmentSteps(50))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Additional functionality"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"`CoachInterface` allows for easy access to functionalities such as multi-threading and saving checkpoints:"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"coach = CoachInterface(preset='CartPole_ClippedPPO', num_workers=2, checkpoint_save_secs=10)\n",
|
|
||||||
"coach.run()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Agent functionality"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"When using `CoachInterface` (single agent with one level of hierarchy) it's also possible to easily use the `Agent` object functionality, such as logging and reading signals and applying the policy the agent has learned on a given state:"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from rl_coach.environments.gym_environment import GymEnvironment, GymVectorEnvironment\n",
|
|
||||||
"from rl_coach.base_parameters import VisualizationParameters\n",
|
"from rl_coach.base_parameters import VisualizationParameters\n",
|
||||||
"from rl_coach.core_types import EnvironmentSteps\n",
|
"from rl_coach.core_types import EnvironmentSteps\n",
|
||||||
"\n",
|
"\n",
|
||||||
"coach = CoachInterface(preset='CartPole_ClippedPPO')\n",
|
"coach = CoachInterface(preset='CartPole_ClippedPPO')\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"# registering an iteration signal before starting to run\n",
|
||||||
|
"coach.graph_manager.log_signal('iteration', -1)\n",
|
||||||
|
"\n",
|
||||||
|
"coach.graph_manager.heatup(EnvironmentSteps(100))\n",
|
||||||
|
"\n",
|
||||||
"# training\n",
|
"# training\n",
|
||||||
"for it in range(10):\n",
|
"for it in range(10):\n",
|
||||||
|
" # logging the iteration signal during training\n",
|
||||||
" coach.graph_manager.log_signal('iteration', it)\n",
|
" coach.graph_manager.log_signal('iteration', it)\n",
|
||||||
|
" # using the graph manager to train and act a given number of steps\n",
|
||||||
" coach.graph_manager.train_and_act(EnvironmentSteps(100))\n",
|
" coach.graph_manager.train_and_act(EnvironmentSteps(100))\n",
|
||||||
|
" # reading signals during training\n",
|
||||||
" training_reward = coach.graph_manager.get_signal_value('Training Reward')"
|
" training_reward = coach.graph_manager.get_signal_value('Training Reward')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Sometimes we may want to track the agent's decisions, log or maybe even modify them.\n",
|
||||||
|
"We can access the agent itself through the `CoachInterface` as follows. \n",
|
||||||
|
"\n",
|
||||||
|
"Note that we also need an instance of the environment to do so. In this case we use instantiate a `GymEnvironment` object with the CartPole `GymVectorEnvironment`:"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -200,29 +193,41 @@
|
|||||||
"env_params = GymVectorEnvironment(level='CartPole-v0')\n",
|
"env_params = GymVectorEnvironment(level='CartPole-v0')\n",
|
||||||
"env = GymEnvironment(**env_params.__dict__, visualization_parameters=VisualizationParameters())\n",
|
"env = GymEnvironment(**env_params.__dict__, visualization_parameters=VisualizationParameters())\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for it in range(10):\n",
|
"response = env.reset_internal_state()\n",
|
||||||
" action_info = coach.graph_manager.get_agent().choose_action(env.state)\n",
|
"for _ in range(10):\n",
|
||||||
" print(\"State:{}, Action:{}\".format(env.state,action_info.action))\n",
|
" action_info = coach.graph_manager.get_agent().choose_action(response.next_state)\n",
|
||||||
" env.step(action_info.action)"
|
" print(\"State:{}, Action:{}\".format(response.next_state,action_info.action))\n",
|
||||||
|
" response = env.step(action_info.action)\n",
|
||||||
|
" print(\"Reward:{}\".format(response.reward))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Using GraphManager Directly"
|
"### Non-preset - using `GraphManager` directly"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file.\n",
|
"It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Training an agent with a custom Gym environment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Here we show an example of how to do so with a custom environment.\n",
|
"Here we show an example of how to use the `GraphManager` to train an agent on a custom Gym environment.\n",
|
||||||
"We can use a custom gym environment without registering it. \n",
|
"\n",
|
||||||
"We just need the path to the environment module.\n",
|
"We first construct a `GymEnvironmentParameters` object describing the environment parameters. For Gym environments with vector observations, we can use the more specific `GymVectorEnvironment` object. \n",
|
||||||
"We can also pass custom parameters for the environment `__init__` function as `additional_simulator_parameters`."
|
"\n",
|
||||||
|
"The path to the custom environment is defined in the `level` parameter and it can be the absolute path to its class (e.g. `'/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass'`) or the relative path to the module as in this example. In any case, we can use the custom gym environment without registering it.\n",
|
||||||
|
"\n",
|
||||||
|
"Custom parameters for the environment's `__init__` function can be passed as `additional_simulator_parameters`."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -269,23 +274,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The path to the environment can also be set as an absolute path, as follows: `<absolute python module path>:<environment class>`. For example:"
|
"#### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"env_params = GymVectorEnvironment(level='/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass')"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -416,6 +405,13 @@
|
|||||||
"# Clearning up\n",
|
"# Clearning up\n",
|
||||||
"shutil.rmtree(my_checkpoint_dir)"
|
"shutil.rmtree(my_checkpoint_dir)"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
Reference in New Issue
Block a user