1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/docs/design/index.html
2018-04-23 09:14:20 +03:00

364 lines
11 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Design - Reinforcement Learning Coach Documentation</title>
<link rel="shortcut icon" href="../img/favicon.ico">
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../css/highlight.css">
<link href="../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Design";
</script>
<script src="../js/jquery-2.1.1.min.js"></script>
<script src="../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../js/highlight.pack.js"></script>
<script src="../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> Reinforcement Learning Coach Documentation</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li>
<li class="toctree-l1 ">
<a class="" href="../index.html">Home</a>
</li>
<li>
<li>
<li class="toctree-l1 current">
<a class="current" href="./index.html">Design</a>
<ul>
<li class="toctree-l3"><a href="#coach-design">Coach Design</a></li>
<li><a class="toctree-l4" href="#network-design">Network Design</a></li>
<li><a class="toctree-l4" href="#keeping-network-copies-in-sync">Keeping Network Copies in Sync</a></li>
<li><a class="toctree-l4" href="#supported-algorithms">Supported Algorithms</a></li>
</ul>
</li>
<li>
<li>
<li class="toctree-l1 ">
<a class="" href="../usage/index.html">Usage</a>
</li>
<li>
<li>
<ul class="subnav">
<li><span>Algorithms</span></li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/dqn/index.html">DQN</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/double_dqn/index.html">Double DQN</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/dueling_dqn/index.html">Dueling DQN</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/categorical_dqn/index.html">Categorical DQN</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/mmc/index.html">Mixed Monte Carlo</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/pal/index.html">Persistent Advantage Learning</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/nec/index.html">Neural Episodic Control</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/bs_dqn/index.html">Bootstrapped DQN</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/n_step/index.html">N-Step Q Learning</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/value_optimization/naf/index.html">Normalized Advantage Functions</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/policy_optimization/pg/index.html">Policy Gradient</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/policy_optimization/ac/index.html">Actor-Critic</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/policy_optimization/ddpg/index.html">Deep Determinstic Policy Gradients</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/policy_optimization/ppo/index.html">Proximal Policy Optimization</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/policy_optimization/cppo/index.html">Clipped Proximal Policy Optimization</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/other/dfp/index.html">Direct Future Prediction</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../algorithms/imitation/bc/index.html">Behavioral Cloning</a>
</li>
</ul>
<li>
<li>
<li class="toctree-l1 ">
<a class="" href="../dashboard/index.html">Coach Dashboard</a>
</li>
<li>
<li>
<ul class="subnav">
<li><span>Contributing</span></li>
<li class="toctree-l1 ">
<a class="" href="../contributing/add_agent/index.html">Adding a New Agent</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../contributing/add_env/index.html">Adding a New Environment</a>
</li>
</ul>
<li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Reinforcement Learning Coach Documentation</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Design</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="coach-design">Coach Design</h1>
<h2 id="network-design">Network Design</h2>
<p>Each agent has at least one neural network, used as the function approximator, for choosing the actions. The network is designed in a modular way to allow reusability in different agents. It is separated into three main parts:</p>
<ul>
<li>
<p><strong>Input Embedders</strong> - This is the first stage of the network, meant to convert the input into a feature vector representation. It is possible to combine several instances of any of the supported embedders, in order to allow varied combinations of inputs. </p>
<p>There are two main types of input embedders: </p>
<ol>
<li>Image embedder - Convolutional neural network. </li>
<li>Vector embedder - Multi-layer perceptron. </li>
</ol>
</li>
<li>
<p><strong>Middlewares</strong> - The middleware gets the output of the input embedder, and processes it into a different representation domain, before sending it through the output head. The goal of the middleware is to enable processing the combined outputs of several input embedders, and pass them through some extra processing. This, for instance, might include an LSTM or just a plain simple FC layer.</p>
</li>
<li>
<p><strong>Output Heads</strong> - The output head is used in order to predict the values required from the network. These might include action-values, state-values or a policy. As with the input embedders, it is possible to use several output heads in the same network. For example, the <em>Actor Critic</em> agent combines two heads - a policy head and a state-value head.
In addition, the output heads defines the loss function according to the head type.</p>
</li>
</ul>
<p></p>
<p style="text-align: center;">
<img src="../img/network.png" alt="Network Design" style="width: 400px;"/>
</p>
<h2 id="keeping-network-copies-in-sync">Keeping Network Copies in Sync</h2>
<p>Most of the reinforcement learning agents include more than one copy of the neural network. These copies serve as counterparts of the main network which are updated in different rates, and are often synchronized either locally or between parallel workers. For easier synchronization of those copies, a wrapper around these copies exposes a simplified API, which allows hiding these complexities from the agent. </p>
<p style="text-align: center;">
<img src="../img/distributed.png" alt="Distributed Training" style="width: 600px;"/>
</p>
<h2 id="supported-algorithms">Supported Algorithms</h2>
<p>Coach supports many state-of-the-art reinforcement learning algorithms, which are separated into two main classes - value optimization and policy optimization. A detailed description of those algorithms may be found in the algorithms section.</p>
<p style="text-align: center;">
<img src="../img/algorithms.png" alt="Supported Algorithms" style="width: 600px;"/>
</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../usage/index.html" class="btn btn-neutral float-right" title="Usage"/>Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../index.html" class="btn btn-neutral" title="Home"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../usage/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
</body>
</html>