1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00

update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
This commit is contained in:
Itai Caspi
2018-11-15 15:00:13 +02:00
committed by Gal Novik
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions

View File

@@ -0,0 +1,313 @@
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Adding a New Agent &mdash; Reinforcement Learning Coach 0.11.0 documentation</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Adding a New Environment" href="add_env.html" />
<link rel="prev" title="Network Design" href="../design/network.html" />
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">
<script src="../_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> Reinforcement Learning Coach
<img src="../_static/dark_logo.png" class="logo" alt="Logo"/>
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Intro</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../usage.html">Usage</a></li>
<li class="toctree-l1"><a class="reference internal" href="../features/index.html">Features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
<li class="toctree-l1"><a class="reference internal" href="../dashboard.html">Coach Dashboard</a></li>
</ul>
<p class="caption"><span class="caption-text">Design</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../design/control_flow.html">Control Flow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../design/network.html">Network Design</a></li>
</ul>
<p class="caption"><span class="caption-text">Contributing</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Adding a New Agent</a></li>
<li class="toctree-l1"><a class="reference internal" href="add_env.html">Adding a New Environment</a></li>
</ul>
<p class="caption"><span class="caption-text">Components</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../components/agents/index.html">Agents</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/architectures/index.html">Architectures</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/environments/index.html">Environments</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/exploration_policies/index.html">Exploration Policies</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/filters/index.html">Filters</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/memories/index.html">Memories</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/core_types.html">Core Types</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/spaces.html">Spaces</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/additional_parameters.html">Additional Parameters</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Adding a New Agent</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/contributing/add_agent.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="adding-a-new-agent">
<h1>Adding a New Agent<a class="headerlink" href="#adding-a-new-agent" title="Permalink to this headline"></a></h1>
<p>Coachs modularity makes adding an agent a simple and clean task.
We suggest using the following
<a class="reference external" href="https://github.com/NervanaSystems/coach/blob/master/tutorials/1.%20Implementing%20an%20Algorithm.ipynb">Jupyter notebook tutorial</a>
to ramp up on this process. In general, it involves the following steps:</p>
<ol class="arabic">
<li><p class="first">Implement your algorithm in a new file. The agent can inherit base classes such as <strong>ValueOptimizationAgent</strong> or
<strong>ActorCriticAgent</strong>, or the more generic <strong>Agent</strong> base class.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last"><strong>ValueOptimizationAgent</strong>, <strong>PolicyOptimizationAgent</strong> and <strong>Agent</strong> are abstract classes.
<code class="code docutils literal notranslate"><span class="pre">learn_from_batch()</span></code> should be overriden with the desired behavior for the algorithm being implemented.
If deciding to inherit from <strong>Agent</strong>, also <code class="code docutils literal notranslate"><span class="pre">choose_action()</span></code> should be overriden.</p>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">learn_from_batch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tuple</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">List</span><span class="p">]:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Given a batch of transitions, calculates their target values and updates the network.</span>
<span class="sd"> :param batch: A list of transitions</span>
<span class="sd"> :return: The total loss of the training, the loss per head and the unclipped gradients</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">choose_action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">curr_state</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> choose an action to act with in the current episode being played. Different behavior might be exhibited when training</span>
<span class="sd"> or testing.</span>
<span class="sd"> :param curr_state: the current state to act upon.</span>
<span class="sd"> :return: chosen action, some action value describing the action (q-value, probability, etc)</span>
<span class="sd"> &quot;&quot;&quot;</span>
</pre></div>
</div>
</li>
<li><p class="first">Implement your agents specific network head, if needed, at the implementation for the framework of your choice.
For example <strong>architectures/neon_components/heads.py</strong>. The head will inherit the generic base class Head.
A new output type should be added to configurations.py, and a mapping between the new head and output type should
be defined in the get_output_head() function at <strong>architectures/neon_components/general_network.py</strong></p>
</li>
<li><p class="first">Define a new parameters class that inherits AgentParameters.
The parameters class defines all the hyperparameters for the agent, and is initialized with 4 main components:</p>
<ul class="simple">
<li><strong>algorithm</strong>: A class inheriting AlgorithmParameters which defines any algorithm specific parameters</li>
<li><strong>exploration</strong>: A class inheriting ExplorationParameters which defines the exploration policy parameters.
There are several common exploration policies built-in which you can use, and are defined under
the exploration sub directory. You can also define your own custom exploration policy.</li>
<li><strong>memory</strong>: A class inheriting MemoryParameters which defined the memory parameters.
There are several common memory types built-in which you can use, and are defined under the memories
sub directory. You can also define your own custom memory.</li>
<li><strong>networks</strong>: A dictionary defining all the networks that will be used by the agent. The keys of the dictionary
define the network name and will be used to access each network through the agent class.
The dictionary values are a class inheriting NetworkParameters, which define the network structure
and parameters.</li>
</ul>
<p>Additionally, set the path property to return the path to your agent class in the following format:</p>
<p><code class="code docutils literal notranslate"><span class="pre">&lt;path</span> <span class="pre">to</span> <span class="pre">python</span> <span class="pre">module&gt;:&lt;name</span> <span class="pre">of</span> <span class="pre">agent</span> <span class="pre">class&gt;</span></code></p>
<p>For example,</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">RainbowAgentParameters</span><span class="p">(</span><span class="n">AgentParameters</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">algorithm</span><span class="o">=</span><span class="n">RainbowAlgorithmParameters</span><span class="p">(),</span>
<span class="n">exploration</span><span class="o">=</span><span class="n">RainbowExplorationParameters</span><span class="p">(),</span>
<span class="n">memory</span><span class="o">=</span><span class="n">RainbowMemoryParameters</span><span class="p">(),</span>
<span class="n">networks</span><span class="o">=</span><span class="p">{</span><span class="s2">&quot;main&quot;</span><span class="p">:</span> <span class="n">RainbowNetworkParameters</span><span class="p">()})</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">path</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">&#39;rainbow.rainbow_agent:RainbowAgent&#39;</span>
</pre></div>
</div>
</li>
<li><p class="first">(Optional) Define a preset using the new agent type with a given environment, and the hyper-parameters that should
be used for training on that environment.</p>
</li>
</ol>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="add_env.html" class="btn btn-neutral float-right" title="Adding a New Environment" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="../design/network.html" class="btn btn-neutral" title="Network Design" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2018, Intel AI Lab
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@@ -1,340 +0,0 @@
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Adding a New Agent - Reinforcement Learning Coach</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../../css/highlight.css">
<link href="../../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Adding a New Agent";
var mkdocs_page_input_path = "contributing/add_agent.md";
var mkdocs_page_url = "/contributing/add_agent/";
</script>
<script src="../../js/jquery-2.1.1.min.js"></script>
<script src="../../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Reinforcement Learning Coach</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../..">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../usage/">Usage</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Design</span>
<ul class="subnav">
<li class="">
<a class="" href="../../design/features/">Features</a>
</li>
<li class="">
<a class="" href="../../design/control_flow/">Control Flow</a>
</li>
<li class="">
<a class="" href="../../design/network/">Network</a>
</li>
<li class="">
<a class="" href="../../design/filters/">Filters</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../../algorithms/value_optimization/dqn/">DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/double_dqn/">Double DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/dueling_dqn/">Dueling DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/categorical_dqn/">Categorical DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/mmc/">Mixed Monte Carlo</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/pal/">Persistent Advantage Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/nec/">Neural Episodic Control</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/bs_dqn/">Bootstrapped DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/n_step/">N-Step Q Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/naf/">Normalized Advantage Functions</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/pg/">Policy Gradient</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ac/">Actor-Critic</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ddpg/">Deep Determinstic Policy Gradients</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ppo/">Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/cppo/">Clipped Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/other/dfp/">Direct Future Prediction</a>
</li>
<li class="">
<a class="" href="../../algorithms/imitation/bc/">Behavioral Cloning</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../../dashboard/">Coach Dashboard</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Contributing</span>
<ul class="subnav">
<li class=" current">
<a class="current" href="./">Adding a New Agent</a>
<ul class="subnav">
</ul>
</li>
<li class="">
<a class="" href="../add_env/">Adding a New Environment</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>Contributing &raquo;</li>
<li>Adding a New Agent</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<!-- language-all: python -->
<p>Coach's modularity makes adding an agent a simple and clean task, that involves the following steps:</p>
<ol>
<li>
<p>Implement your algorithm in a new file. The agent can inherit base classes such as <strong>ValueOptimizationAgent</strong> or
<strong>ActorCriticAgent</strong>, or the more generic <strong>Agent</strong> base class.</p>
<ul>
<li><strong>ValueOptimizationAgent</strong>, <strong>PolicyOptimizationAgent</strong> and <strong>Agent</strong> are abstract classes.
learn_from_batch() should be overriden with the desired behavior for the algorithm being implemented.
If deciding to inherit from <strong>Agent</strong>, also choose_action() should be overriden.<pre><code>def learn_from_batch(self, batch) -&gt; Tuple[float, List, List]:
"""
Given a batch of transitions, calculates their target values and updates the network.
:param batch: A list of transitions
:return: The total loss of the training, the loss per head and the unclipped gradients
"""
def choose_action(self, curr_state):
"""
choose an action to act with in the current episode being played. Different behavior might be exhibited when training
or testing.
:param curr_state: the current state to act upon.
:return: chosen action, some action value describing the action (q-value, probability, etc)
"""
</code></pre>
</li>
</ul>
</li>
<li>
<p>Implement your agent's specific network head, if needed, at the implementation for the framework of your choice.
For example <strong>architectures/neon_components/heads.py</strong>. The head will inherit the generic base class Head.
A new output type should be added to configurations.py, and a mapping between the new head and output type should
be defined in the get_output_head() function at <strong>architectures/neon_components/general_network.py</strong></p>
</li>
<li>
<p>Define a new parameters class that inherits AgentParameters.
The parameters class defines all the hyperparameters for the agent, and is initialized with 4 main components:</p>
<ul>
<li><strong>algorithm</strong>: A class inheriting AlgorithmParameters which defines any algorithm specific parameters</li>
<li><strong>exploration</strong>: A class inheriting ExplorationParameters which defines the exploration policy parameters.
There are several common exploration policies built-in which you can use, and are defined under
the exploration sub directory. You can also define your own custom exploration policy.</li>
<li><strong>memory</strong>: A class inheriting MemoryParameters which defined the memory parameters.
There are several common memory types built-in which you can use, and are defined under the memories
sub directory. You can also define your own custom memory.</li>
<li><strong>networks</strong>: A dictionary defining all the networks that will be used by the agent. The keys of the dictionary
define the network name and will be used to access each network through the agent class.
The dictionary values are a class inheriting NetworkParameters, which define the network structure
and parameters.</li>
</ul>
<p>Additionally, set the path property to return the path to your agent class in the following format:</p>
<pre><code> &lt;path to python module&gt;:&lt;name of agent class&gt;
</code></pre>
<p>For example,</p>
<pre><code> class RainbowAgentParameters(AgentParameters):
def __init__(self):
super().__init__(algorithm=RainbowAlgorithmParameters(),
exploration=RainbowExplorationParameters(),
memory=RainbowMemoryParameters(),
networks={"main": RainbowNetworkParameters()})
@property
def path(self):
return 'rainbow.rainbow_agent:RainbowAgent'
</code></pre>
</li>
<li>
<p>(Optional) Define a preset using the new agent type with a given environment, and the hyper-parameters that should
be used for training on that environment.</p>
</li>
</ol>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../add_env/" class="btn btn-neutral float-right" title="Adding a New Environment">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../../dashboard/" class="btn btn-neutral" title="Coach Dashboard"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../../dashboard/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../add_env/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../../search/require.js"></script>
<script src="../../search/search.js"></script>
</body>
</html>

View File

@@ -0,0 +1,332 @@
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Adding a New Environment &mdash; Reinforcement Learning Coach 0.11.0 documentation</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Agents" href="../components/agents/index.html" />
<link rel="prev" title="Adding a New Agent" href="add_agent.html" />
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">
<script src="../_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> Reinforcement Learning Coach
<img src="../_static/dark_logo.png" class="logo" alt="Logo"/>
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Intro</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../usage.html">Usage</a></li>
<li class="toctree-l1"><a class="reference internal" href="../features/index.html">Features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../selecting_an_algorithm.html">Selecting an Algorithm</a></li>
<li class="toctree-l1"><a class="reference internal" href="../dashboard.html">Coach Dashboard</a></li>
</ul>
<p class="caption"><span class="caption-text">Design</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../design/control_flow.html">Control Flow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../design/network.html">Network Design</a></li>
</ul>
<p class="caption"><span class="caption-text">Contributing</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="add_agent.html">Adding a New Agent</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Adding a New Environment</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#using-the-openai-gym-api">Using the OpenAI Gym API</a></li>
<li class="toctree-l2"><a class="reference internal" href="#using-the-coach-api">Using the Coach API</a></li>
</ul>
</li>
</ul>
<p class="caption"><span class="caption-text">Components</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../components/agents/index.html">Agents</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/architectures/index.html">Architectures</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/environments/index.html">Environments</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/exploration_policies/index.html">Exploration Policies</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/filters/index.html">Filters</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/memories/index.html">Memories</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/core_types.html">Core Types</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/spaces.html">Spaces</a></li>
<li class="toctree-l1"><a class="reference internal" href="../components/additional_parameters.html">Additional Parameters</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Adding a New Environment</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/contributing/add_env.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="adding-a-new-environment">
<h1>Adding a New Environment<a class="headerlink" href="#adding-a-new-environment" title="Permalink to this headline"></a></h1>
<p>Adding a new environment to Coach is as easy as solving CartPole.</p>
<p>There are essentially two ways to integrate new environments to Coach:</p>
<div class="section" id="using-the-openai-gym-api">
<h2>Using the OpenAI Gym API<a class="headerlink" href="#using-the-openai-gym-api" title="Permalink to this headline"></a></h2>
<p>If your environment is already using the OpenAI Gym API, you are already good to go.
When selecting the environment parameters in the preset, use <code class="code docutils literal notranslate"><span class="pre">GymEnvironmentParameters()</span></code>,
and pass the path to your environment source code using the level parameter.
You can specify additional parameters for your environment using the additional_simulator_parameters parameter.
Take for example the definition used in the <code class="code docutils literal notranslate"><span class="pre">Pendulum_HAC</span></code> preset:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">env_params</span> <span class="o">=</span> <span class="n">GymEnvironmentParameters</span><span class="p">()</span>
<span class="n">env_params</span><span class="o">.</span><span class="n">level</span> <span class="o">=</span> <span class="s2">&quot;rl_coach.environments.mujoco.pendulum_with_goals:PendulumWithGoals&quot;</span>
<span class="n">env_params</span><span class="o">.</span><span class="n">additional_simulator_parameters</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;time_limit&quot;</span><span class="p">:</span> <span class="mi">1000</span><span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="using-the-coach-api">
<h2>Using the Coach API<a class="headerlink" href="#using-the-coach-api" title="Permalink to this headline"></a></h2>
<p>There are a few simple steps to follow, and we will walk through them one by one.
As an alternative, we highly recommend following the corresponding
<a class="reference external" href="https://github.com/NervanaSystems/coach/blob/master/tutorials/2.%20Adding%20an%20Environment.ipynb">tutorial</a>
in the GitHub repo.</p>
<ol class="arabic">
<li><p class="first">Create a new class for your environment, and inherit the Environment class.</p>
</li>
<li><p class="first">Coach defines a simple API for implementing a new environment, which are defined in environment/environment.py.
There are several functions to implement, but only some of them are mandatory.</p>
<p>Here are the important ones:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_take_action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">action_idx</span><span class="p">:</span> <span class="n">ActionType</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> An environment dependent function that sends an action to the simulator.</span>
<span class="sd"> :param action_idx: the action to perform on the environment</span>
<span class="sd"> :return: None</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">_update_state</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Updates the state from the environment.</span>
<span class="sd"> Should update self.observation, self.reward, self.done, self.measurements and self.info</span>
<span class="sd"> :return: None</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">_restart_environment_episode</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">force_environment_reset</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Restarts the simulator episode</span>
<span class="sd"> :param force_environment_reset: Force the environment to reset even if the episode is not done yet.</span>
<span class="sd"> :return: None</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">_render</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Renders the environment using the native simulator renderer</span>
<span class="sd"> :return: None</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">get_rendered_image</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Return a numpy array containing the image that will be rendered to the screen.</span>
<span class="sd"> This can be different from the observation. For example, mujoco&#39;s observation is a measurements vector.</span>
<span class="sd"> :return: numpy array containing the image that will be rendered to the screen</span>
<span class="sd"> &quot;&quot;&quot;</span>
</pre></div>
</div>
</li>
<li><p class="first">Create a new parameters class for your environment, which inherits the EnvironmentParameters class.
In the __init__ of your class, define all the parameters you used in your Environment class.
Additionally, fill the path property of the class with the path to your Environment class.
For example, take a look at the EnvironmentParameters class used for Doom:</p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">DoomEnvironmentParameters</span><span class="p">(</span><span class="n">EnvironmentParameters</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">default_input_filter</span> <span class="o">=</span> <span class="n">DoomInputFilter</span>
<span class="bp">self</span><span class="o">.</span><span class="n">default_output_filter</span> <span class="o">=</span> <span class="n">DoomOutputFilter</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cameras</span> <span class="o">=</span> <span class="p">[</span><span class="n">DoomEnvironment</span><span class="o">.</span><span class="n">CameraTypes</span><span class="o">.</span><span class="n">OBSERVATION</span><span class="p">]</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">path</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">&#39;rl_coach.environments.doom_environment:DoomEnvironment&#39;</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p class="first">And thats it, youre done. Now just add a new preset with your newly created environment, and start training an agent on top of it.</p>
</li>
</ol>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../components/agents/index.html" class="btn btn-neutral float-right" title="Agents" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="add_agent.html" class="btn btn-neutral" title="Adding a New Agent" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2018, Intel AI Lab
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@@ -1,348 +0,0 @@
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Adding a New Environment - Reinforcement Learning Coach</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../../css/highlight.css">
<link href="../../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Adding a New Environment";
var mkdocs_page_input_path = "contributing/add_env.md";
var mkdocs_page_url = "/contributing/add_env/";
</script>
<script src="../../js/jquery-2.1.1.min.js"></script>
<script src="../../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Reinforcement Learning Coach</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../..">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../usage/">Usage</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Design</span>
<ul class="subnav">
<li class="">
<a class="" href="../../design/features/">Features</a>
</li>
<li class="">
<a class="" href="../../design/control_flow/">Control Flow</a>
</li>
<li class="">
<a class="" href="../../design/network/">Network</a>
</li>
<li class="">
<a class="" href="../../design/filters/">Filters</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../../algorithms/value_optimization/dqn/">DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/double_dqn/">Double DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/dueling_dqn/">Dueling DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/categorical_dqn/">Categorical DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/mmc/">Mixed Monte Carlo</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/pal/">Persistent Advantage Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/nec/">Neural Episodic Control</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/bs_dqn/">Bootstrapped DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/n_step/">N-Step Q Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/naf/">Normalized Advantage Functions</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/pg/">Policy Gradient</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ac/">Actor-Critic</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ddpg/">Deep Determinstic Policy Gradients</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ppo/">Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/cppo/">Clipped Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/other/dfp/">Direct Future Prediction</a>
</li>
<li class="">
<a class="" href="../../algorithms/imitation/bc/">Behavioral Cloning</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../../dashboard/">Coach Dashboard</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Contributing</span>
<ul class="subnav">
<li class="">
<a class="" href="../add_agent/">Adding a New Agent</a>
</li>
<li class=" current">
<a class="current" href="./">Adding a New Environment</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#using-the-openai-gym-api">Using the OpenAI Gym API</a></li>
<li class="toctree-l3"><a href="#using-the-coach-api">Using the Coach API</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>Contributing &raquo;</li>
<li>Adding a New Environment</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<p>Adding a new environment to Coach is as easy as solving CartPole. </p>
<p>There are essentially two ways to integrate new environments to Coach:</p>
<h2 id="using-the-openai-gym-api">Using the OpenAI Gym API</h2>
<p>If your environment is already using the OpenAI Gym API, you are already good to go.
When selecting the environment parameters in the preset, use GymEnvironmentParameters(),
and pass the path to your environment source code using the level parameter.
You can specify additional parameters for your environment using the additional_simulator_parameters parameter.
Take for example the definition used in the Pendulum_HAC preset:</p>
<pre><code> env_params = GymEnvironmentParameters()
env_params.level = "rl_coach.environments.mujoco.pendulum_with_goals:PendulumWithGoals"
env_params.additional_simulator_parameters = {"time_limit": 1000}
</code></pre>
<h2 id="using-the-coach-api">Using the Coach API</h2>
<p>There are a few simple steps to follow, and we will walk through them one by one.</p>
<ol>
<li>
<p>Create a new class for your environment, and inherit the Environment class.</p>
</li>
<li>
<p>Coach defines a simple API for implementing a new environment, which are defined in environment/environment.py.
There are several functions to implement, but only some of them are mandatory.</p>
<p>Here are the important ones:</p>
<pre><code> def _take_action(self, action_idx: ActionType) -&gt; None:
"""
An environment dependent function that sends an action to the simulator.
:param action_idx: the action to perform on the environment
:return: None
"""
def _update_state(self) -&gt; None:
"""
Updates the state from the environment.
Should update self.observation, self.reward, self.done, self.measurements and self.info
:return: None
"""
def _restart_environment_episode(self, force_environment_reset=False) -&gt; None:
"""
Restarts the simulator episode
:param force_environment_reset: Force the environment to reset even if the episode is not done yet.
:return: None
"""
def _render(self) -&gt; None:
"""
Renders the environment using the native simulator renderer
:return: None
"""
def get_rendered_image(self) -&gt; np.ndarray:
"""
Return a numpy array containing the image that will be rendered to the screen.
This can be different from the observation. For example, mujoco's observation is a measurements vector.
:return: numpy array containing the image that will be rendered to the screen
"""
</code></pre>
</li>
<li>
<p>Create a new parameters class for your environment, which inherits the EnvironmentParameters class.
In the <strong>init</strong> of your class, define all the parameters you used in your Environment class.
Additionally, fill the path property of the class with the path to your Environment class.
For example, take a look at the EnvironmentParameters class used for Doom:</p>
<pre><code> class DoomEnvironmentParameters(EnvironmentParameters):
def __init__(self):
super().__init__()
self.default_input_filter = DoomInputFilter
self.default_output_filter = DoomOutputFilter
self.cameras = [DoomEnvironment.CameraTypes.OBSERVATION]
@property
def path(self):
return 'rl_coach.environments.doom_environment:DoomEnvironment'
</code></pre>
</li>
<li>
<p>And that's it, you're done. Now just add a new preset with your newly created environment, and start training an agent on top of it.</p>
</li>
</ol>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../add_agent/" class="btn btn-neutral" title="Adding a New Agent"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../add_agent/" style="color: #fcfcfc;">&laquo; Previous</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../../search/require.js"></script>
<script src="../../search/search.js"></script>
</body>
</html>