1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/docs/algorithms/imitation/bc/index.html
2018-08-13 17:11:34 +03:00

299 lines
9.6 KiB
HTML

<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../../../img/favicon.ico">
<title>Behavioral Cloning - Reinforcement Learning Coach</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../../../css/highlight.css">
<link href="../../../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Behavioral Cloning";
var mkdocs_page_input_path = "algorithms/imitation/bc.md";
var mkdocs_page_url = "/algorithms/imitation/bc/";
</script>
<script src="../../../js/jquery-2.1.1.min.js"></script>
<script src="../../../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../../../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../../.." class="icon icon-home"> Reinforcement Learning Coach</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../../..">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../../usage/">Usage</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Design</span>
<ul class="subnav">
<li class="">
<a class="" href="../../../design/features/">Features</a>
</li>
<li class="">
<a class="" href="../../../design/control_flow/">Control Flow</a>
</li>
<li class="">
<a class="" href="../../../design/network/">Network</a>
</li>
<li class="">
<a class="" href="../../../design/filters/">Filters</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../../value_optimization/dqn/">DQN</a>
</li>
<li class="">
<a class="" href="../../value_optimization/double_dqn/">Double DQN</a>
</li>
<li class="">
<a class="" href="../../value_optimization/dueling_dqn/">Dueling DQN</a>
</li>
<li class="">
<a class="" href="../../value_optimization/categorical_dqn/">Categorical DQN</a>
</li>
<li class="">
<a class="" href="../../value_optimization/mmc/">Mixed Monte Carlo</a>
</li>
<li class="">
<a class="" href="../../value_optimization/pal/">Persistent Advantage Learning</a>
</li>
<li class="">
<a class="" href="../../value_optimization/nec/">Neural Episodic Control</a>
</li>
<li class="">
<a class="" href="../../value_optimization/bs_dqn/">Bootstrapped DQN</a>
</li>
<li class="">
<a class="" href="../../value_optimization/n_step/">N-Step Q Learning</a>
</li>
<li class="">
<a class="" href="../../value_optimization/naf/">Normalized Advantage Functions</a>
</li>
<li class="">
<a class="" href="../../policy_optimization/pg/">Policy Gradient</a>
</li>
<li class="">
<a class="" href="../../policy_optimization/ac/">Actor-Critic</a>
</li>
<li class="">
<a class="" href="../../policy_optimization/ddpg/">Deep Determinstic Policy Gradients</a>
</li>
<li class="">
<a class="" href="../../policy_optimization/ppo/">Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../policy_optimization/cppo/">Clipped Proximal Policy Optimization</a>
</li>
<li class="">
<a class="" href="../../other/dfp/">Direct Future Prediction</a>
</li>
<li class=" current">
<a class="current" href="./">Behavioral Cloning</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#behavioral-cloning">Behavioral Cloning</a></li>
<ul>
<li><a class="toctree-l4" href="#network-structure">Network Structure</a></li>
<li><a class="toctree-l4" href="#algorithm-description">Algorithm Description</a></li>
</ul>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../../../dashboard/">Coach Dashboard</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Contributing</span>
<ul class="subnav">
<li class="">
<a class="" href="../../../contributing/add_agent/">Adding a New Agent</a>
</li>
<li class="">
<a class="" href="../../../contributing/add_env/">Adding a New Environment</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../..">Reinforcement Learning Coach</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../..">Docs</a> &raquo;</li>
<li>Algorithms &raquo;</li>
<li>Behavioral Cloning</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="behavioral-cloning">Behavioral Cloning</h1>
<p><strong>Actions space:</strong> Discrete|Continuous</p>
<h2 id="network-structure">Network Structure</h2>
<p style="text-align: center;">
<img src="..\..\design_imgs\dqn.png">
</p>
<h2 id="algorithm-description">Algorithm Description</h2>
<h3 id="training-the-network">Training the network</h3>
<p>The replay buffer contains the expert demonstrations for the task.
These demonstrations are given as state, action tuples, and with no reward.
The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.</p>
<ol>
<li>Sample a batch of transitions from the replay buffer.</li>
<li>Use the current states as input to the network, and the expert actions as the targets of the network.</li>
<li>The loss function for the network is MSE, and therefore we use the Q head to minimize this loss.</li>
</ol>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../../../dashboard/" class="btn btn-neutral float-right" title="Coach Dashboard">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../../other/dfp/" class="btn btn-neutral" title="Direct Future Prediction"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../../other/dfp/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../../../dashboard/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../../..';</script>
<script src="../../../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../../../search/require.js"></script>
<script src="../../../search/search.js"></script>
</body>
</html>