1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/docs/contributing/add_agent/index.html
2018-08-13 17:11:34 +03:00

341 lines
12 KiB
HTML

<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Adding a New Agent - Reinforcement Learning Coach</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../../css/highlight.css">
<link href="../../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Adding a New Agent";
var mkdocs_page_input_path = "contributing/add_agent.md";
var mkdocs_page_url = "/contributing/add_agent/";
</script>
<script src="../../js/jquery-2.1.1.min.js"></script>
<script src="../../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Reinforcement Learning Coach</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../..">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../usage/">Usage</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Design</span>
<ul class="subnav">
<li class="">
<a class="" href="../../design/features/">Features</a>
</li>
<li class="">
<a class="" href="../../design/control_flow/">Control Flow</a>
</li>
<li class="">
<a class="" href="../../design/network/">Network</a>
</li>
<li class="">
<a class="" href="../../design/filters/">Filters</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../../algorithms/value_optimization/dqn/">DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/double_dqn/">Double DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/dueling_dqn/">Dueling DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/categorical_dqn/">Categorical DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/mmc/">Mixed Monte Carlo</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/pal/">Persistent Advantage Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/nec/">Neural Episodic Control</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/bs_dqn/">Bootstrapped DQN</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/n_step/">N-Step Q Learning</a>
</li>
<li class="">
<a class="" href="../../algorithms/value_optimization/naf/">Normalized Advantage Functions</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/pg/">Policy Gradient</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ac/">Actor-Critic</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ddpg/">Deep Determinstic Policy Gradients</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/ppo/">Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/policy_optimization/cppo/">Clipped Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../algorithms/other/dfp/">Direct Future Prediction</a>
</li>
<li class="">
<a class="" href="../../algorithms/imitation/bc/">Behavioral Cloning</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../../dashboard/">Coach Dashboard</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Contributing</span>
<ul class="subnav">
<li class=" current">
<a class="current" href="./">Adding a New Agent</a>
<ul class="subnav">
</ul>
</li>
<li class="">
<a class="" href="../add_env/">Adding a New Environment</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>Contributing &raquo;</li>
<li>Adding a New Agent</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<!-- language-all: python -->
<p>Coach's modularity makes adding an agent a simple and clean task, that involves the following steps:</p>
<ol>
<li>
<p>Implement your algorithm in a new file. The agent can inherit base classes such as <strong>ValueOptimizationAgent</strong> or
<strong>ActorCriticAgent</strong>, or the more generic <strong>Agent</strong> base class.</p>
<ul>
<li><strong>ValueOptimizationAgent</strong>, <strong>PolicyOptimizationAgent</strong> and <strong>Agent</strong> are abstract classes.
learn_from_batch() should be overriden with the desired behavior for the algorithm being implemented.
If deciding to inherit from <strong>Agent</strong>, also choose_action() should be overriden.<pre><code>def learn_from_batch(self, batch) -&gt; Tuple[float, List, List]:
"""
Given a batch of transitions, calculates their target values and updates the network.
:param batch: A list of transitions
:return: The total loss of the training, the loss per head and the unclipped gradients
"""
def choose_action(self, curr_state):
"""
choose an action to act with in the current episode being played. Different behavior might be exhibited when training
or testing.
:param curr_state: the current state to act upon.
:return: chosen action, some action value describing the action (q-value, probability, etc)
"""
</code></pre>
</li>
</ul>
</li>
<li>
<p>Implement your agent's specific network head, if needed, at the implementation for the framework of your choice.
For example <strong>architectures/neon_components/heads.py</strong>. The head will inherit the generic base class Head.
A new output type should be added to configurations.py, and a mapping between the new head and output type should
be defined in the get_output_head() function at <strong>architectures/neon_components/general_network.py</strong></p>
</li>
<li>
<p>Define a new parameters class that inherits AgentParameters.
The parameters class defines all the hyperparameters for the agent, and is initialized with 4 main components:</p>
<ul>
<li><strong>algorithm</strong>: A class inheriting AlgorithmParameters which defines any algorithm specific parameters</li>
<li><strong>exploration</strong>: A class inheriting ExplorationParameters which defines the exploration policy parameters.
There are several common exploration policies built-in which you can use, and are defined under
the exploration sub directory. You can also define your own custom exploration policy.</li>
<li><strong>memory</strong>: A class inheriting MemoryParameters which defined the memory parameters.
There are several common memory types built-in which you can use, and are defined under the memories
sub directory. You can also define your own custom memory.</li>
<li><strong>networks</strong>: A dictionary defining all the networks that will be used by the agent. The keys of the dictionary
define the network name and will be used to access each network through the agent class.
The dictionary values are a class inheriting NetworkParameters, which define the network structure
and parameters.</li>
</ul>
<p>Additionally, set the path property to return the path to your agent class in the following format:</p>
<pre><code> &lt;path to python module&gt;:&lt;name of agent class&gt;
</code></pre>
<p>For example,</p>
<pre><code> class RainbowAgentParameters(AgentParameters):
def __init__(self):
super().__init__(algorithm=RainbowAlgorithmParameters(),
exploration=RainbowExplorationParameters(),
memory=RainbowMemoryParameters(),
networks={"main": RainbowNetworkParameters()})
@property
def path(self):
return 'rainbow.rainbow_agent:RainbowAgent'
</code></pre>
</li>
<li>
<p>(Optional) Define a preset using the new agent type with a given environment, and the hyper-parameters that should
be used for training on that environment.</p>
</li>
</ol>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../add_env/" class="btn btn-neutral float-right" title="Adding a New Environment">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../../dashboard/" class="btn btn-neutral" title="Coach Dashboard"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../../dashboard/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../add_env/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../../search/require.js"></script>
<script src="../../search/search.js"></script>
</body>
</html>