Enabling Coach Documentation to be run even when environments are not installed (#326)

2026-07-07 09:56:32 +02:00 · 2019-05-27 10:46:07 +03:00
parent 2b7d536da4
commit 342b7184bc
157 changed files with 5167 additions and 7477 deletions
@@ -8,7 +8,7 @@
  
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
-  <title>Control Flow &mdash; Reinforcement Learning Coach 0.11.0 documentation</title>
+  <title>Control Flow &mdash; Reinforcement Learning Coach 0.12.1 documentation</title>
  

  
@@ -17,13 +17,21 @@
  

  
+  <script type="text/javascript" src="../_static/js/modernizr.min.js"></script>
+  
+    
+      <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
+        <script type="text/javascript" src="../_static/jquery.js"></script>
+        <script type="text/javascript" src="../_static/underscore.js"></script>
+        <script type="text/javascript" src="../_static/doctools.js"></script>
+        <script type="text/javascript" src="../_static/language_data.js"></script>
+        <script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
+    
+    <script type="text/javascript" src="../_static/js/theme.js"></script>

-  
-  
    

  
-
  <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
  <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
@@ -33,21 +41,16 @@
    <link rel="prev" title="Coach Dashboard" href="../dashboard.html" />
    <link href="../_static/css/custom.css" rel="stylesheet" type="text/css">

-
-  
-  <script src="../_static/js/modernizr.min.js"></script>
-
 </head>

 <body class="wy-body-for-nav">

   
  <div class="wy-grid-for-nav">
-
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
-        <div class="wy-side-nav-search">
+        <div class="wy-side-nav-search" >
          

          
@@ -210,17 +213,17 @@ The graph manager’s main loop is the improve loop.</p>
 <a class="reference internal image-reference" href="../_images/improve.png"><img alt="../_images/improve.png" class="align-center" src="../_images/improve.png" style="width: 400px;" /></a>
 <p>The improve loop skips between 3 main phases - heatup, training and evaluation:</p>
 <ul class="simple">
-<li><strong>Heatup</strong> - the goal of this phase is to collect initial data for populating the replay buffers. The heatup phase
+<li><p><strong>Heatup</strong> - the goal of this phase is to collect initial data for populating the replay buffers. The heatup phase
 takes place only in the beginning of the experiment, and the agents will act completely randomly during this phase.
 Importantly, the agents do not train their networks during this phase. DQN for example, uses 50k random steps in order
-to initialize the replay buffers.</li>
-<li><strong>Training</strong> - the training phase is the main phase of the experiment. This phase can change between agent types,
+to initialize the replay buffers.</p></li>
+<li><p><strong>Training</strong> - the training phase is the main phase of the experiment. This phase can change between agent types,
 but essentially consists of repeated cycles of acting, collecting data from the environment, and training the agent
 networks. During this phase, the agent will use its exploration policy in training mode, which will add noise to its
-actions in order to improve its knowledge about the environment state space.</li>
-<li><strong>Evaluation</strong> - the evaluation phase is intended for evaluating the current performance of the agent. The agents
+actions in order to improve its knowledge about the environment state space.</p></li>
+<li><p><strong>Evaluation</strong> - the evaluation phase is intended for evaluating the current performance of the agent. The agents
 will act greedily in order to exploit the knowledge aggregated so far and the performance over multiple episodes of
-evaluation will be averaged in order to reduce the stochasticity effects of all the components.</li>
+evaluation will be averaged in order to reduce the stochasticity effects of all the components.</p></li>
 </ul>
 </div>
 <div class="section" id="level-manager">
@@ -240,29 +243,29 @@ a lower hierarchy level.</p>
 <h2>Agent<a class="headerlink" href="#agent" title="Permalink to this headline">¶</a></h2>
 <p>The base agent class has 3 main function that will be used during those phases - observe, act and train.</p>
 <ul class="simple">
-<li><strong>Observe</strong> - this function gets the latest response from the environment as input, and updates the internal state
+<li><p><strong>Observe</strong> - this function gets the latest response from the environment as input, and updates the internal state
 of the agent with the new information. The environment response will
 be first passed through the agent’s <code class="code docutils literal notranslate"><span class="pre">InputFilter</span></code> object, which will process the values in the response, according
 to the specific agent definition. The environment response will then be converted into a
 <code class="code docutils literal notranslate"><span class="pre">Transition</span></code> which will contain the information from a single step
-<span class="math notranslate nohighlight">\((s_{t}, a_{t}, r_{t}, s_{t+1}, \textrm{terminal signal})\)</span>, and store it in the memory.</li>
+<span class="math notranslate nohighlight">\((s_{t}, a_{t}, r_{t}, s_{t+1}, \textrm{terminal signal})\)</span>, and store it in the memory.</p></li>
 </ul>
 <a class="reference internal image-reference" href="../_images/observe.png"><img alt="../_images/observe.png" class="align-center" src="../_images/observe.png" style="width: 700px;" /></a>
 <ul class="simple">
-<li><strong>Act</strong> - this function uses the current internal state of the agent in order to select the next action to take on
+<li><p><strong>Act</strong> - this function uses the current internal state of the agent in order to select the next action to take on
 the environment. This function will call the per-agent custom function <code class="code docutils literal notranslate"><span class="pre">choose_action</span></code> that will use the network
 and the exploration policy in order to select an action. The action will be stored, together with any additional
 information (like the action value for example) in an <code class="code docutils literal notranslate"><span class="pre">ActionInfo</span></code> object. The ActionInfo object will then be
 passed through the agent’s <code class="code docutils literal notranslate"><span class="pre">OutputFilter</span></code> to allow any processing of the action (like discretization,
-or shifting, for example), before passing it to the environment.</li>
+or shifting, for example), before passing it to the environment.</p></li>
 </ul>
 <a class="reference internal image-reference" href="../_images/act.png"><img alt="../_images/act.png" class="align-center" src="../_images/act.png" style="width: 700px;" /></a>
 <ul class="simple">
-<li><strong>Train</strong> - this function will sample a batch from the memory and train on it. The batch of transitions will be
+<li><p><strong>Train</strong> - this function will sample a batch from the memory and train on it. The batch of transitions will be
 first wrapped into a <code class="code docutils literal notranslate"><span class="pre">Batch</span></code> object to allow efficient querying of the batch values. It will then be passed into
 the agent specific <code class="code docutils literal notranslate"><span class="pre">learn_from_batch</span></code> function, that will extract network target values from the batch and will
 train the networks accordingly. Lastly, if there’s a target network defined for the agent, it will sync the target
-network weights with the online network.</li>
+network weights with the online network.</p></li>
 </ul>
 <a class="reference internal image-reference" href="../_images/train.png"><img alt="../_images/train.png" class="align-center" src="../_images/train.png" style="width: 700px;" /></a>
 </div>
@@ -279,7 +282,7 @@ network weights with the online network.</li>
        <a href="network.html" class="btn btn-neutral float-right" title="Network Design" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
      
      
-        <a href="../dashboard.html" class="btn btn-neutral" title="Coach Dashboard" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
+        <a href="../dashboard.html" class="btn btn-neutral float-left" title="Coach Dashboard" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
      
    </div>
  
@@ -288,7 +291,7 @@ network weights with the online network.</li>

  <div role="contentinfo">
    <p>
-        &copy; Copyright 2018, Intel AI Lab
+        &copy; Copyright 2018-2019, Intel AI Lab

    </p>
  </div>
@@ -305,27 +308,16 @@ network weights with the online network.</li>
  


-  
-
-    
-    
-      <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
-        <script type="text/javascript" src="../_static/jquery.js"></script>
-        <script type="text/javascript" src="../_static/underscore.js"></script>
-        <script type="text/javascript" src="../_static/doctools.js"></script>
-        <script type="text/javascript" src="../_static/language_data.js"></script>
-        <script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
-    
-
-  
-
-  <script type="text/javascript" src="../_static/js/theme.js"></script>
-
  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
-  </script> 
+  </script>
+
+  
+  
+    
+   

 </body>
 </html>