1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00

SAC algorithm (#282)

* SAC algorithm

* SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train.
gym_environment - fixing an error in access to gym.spaces

* Soft Actor Critic - code cleanup

* code cleanup

* V-head initialization fix

* SAC benchmarks

* SAC Documentation

* typo fix

* documentation fixes

* documentation and version update

* README typo
This commit is contained in:
guyk1971
2019-05-01 18:37:49 +03:00
committed by shadiendrawis
parent 33dc29ee99
commit 74db141d5e
92 changed files with 2812 additions and 402 deletions

View File

@@ -276,19 +276,22 @@ of the trace tests suite.</li>
<h2>TaskParameters<a class="headerlink" href="#taskparameters" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="rl_coach.base_parameters.TaskParameters">
<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">TaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks = &lt;Frameworks.tensorflow: 'TensorFlow'&gt;</em>, <em>evaluate_only: bool = False</em>, <em>use_cpu: bool = False</em>, <em>experiment_path='/tmp'</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em>, <em>num_gpu: int = 1</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#TaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.TaskParameters" title="Permalink to this definition"></a></dt>
<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">TaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks = &lt;Frameworks.tensorflow: 'TensorFlow'&gt;</em>, <em>evaluate_only: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path='/tmp'</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_restore_path=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em>, <em>num_gpu: int = 1</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#TaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.TaskParameters" title="Permalink to this definition"></a></dt>
<dd><table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>framework_type</strong> deep learning framework type. currently only tensorflow is supported</li>
<li><strong>evaluate_only</strong> the task will be used only for evaluating the model</li>
<li><strong>evaluate_only</strong> if not None, the task will be used only for evaluating the model for the given number of steps.
A value of 0 means that task will be evaluated for an infinite number of steps.</li>
<li><strong>use_cpu</strong> use the cpu for this task</li>
<li><strong>experiment_path</strong> the path to the directory which will store all the experiment outputs</li>
<li><strong>seed</strong> a seed to use for the random numbers generator</li>
<li><strong>checkpoint_save_secs</strong> the number of seconds between each checkpoint saving</li>
<li><strong>checkpoint_restore_dir</strong> the directory to restore the checkpoints from</li>
<li><strong>checkpoint_restore_dir</strong> [DEPECRATED - will be removed in one of the next releases - switch to checkpoint_restore_path]
the dir to restore the checkpoints from</li>
<li><strong>checkpoint_restore_path</strong> the path to restore the checkpoints from</li>
<li><strong>checkpoint_save_dir</strong> the directory to store the checkpoints in</li>
<li><strong>export_onnx_graph</strong> If set to True, this will export an onnx graph each time a checkpoint is saved</li>
<li><strong>apply_stop_condition</strong> If set to True, this will apply the stop condition defined by reaching a target success rate</li>
@@ -305,14 +308,15 @@ of the trace tests suite.</li>
<h2>DistributedTaskParameters<a class="headerlink" href="#distributedtaskparameters" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="rl_coach.base_parameters.DistributedTaskParameters">
<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">DistributedTaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks</em>, <em>parameters_server_hosts: str</em>, <em>worker_hosts: str</em>, <em>job_type: str</em>, <em>task_index: int</em>, <em>evaluate_only: bool = False</em>, <em>num_tasks: int = None</em>, <em>num_training_tasks: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path=None</em>, <em>dnd=None</em>, <em>shared_memory_scratchpad=None</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#DistributedTaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.DistributedTaskParameters" title="Permalink to this definition"></a></dt>
<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">DistributedTaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks</em>, <em>parameters_server_hosts: str</em>, <em>worker_hosts: str</em>, <em>job_type: str</em>, <em>task_index: int</em>, <em>evaluate_only: int = None</em>, <em>num_tasks: int = None</em>, <em>num_training_tasks: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path=None</em>, <em>dnd=None</em>, <em>shared_memory_scratchpad=None</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_path=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#DistributedTaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.DistributedTaskParameters" title="Permalink to this definition"></a></dt>
<dd><table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>framework_type</strong> deep learning framework type. currently only tensorflow is supported</li>
<li><strong>evaluate_only</strong> the task will be used only for evaluating the model</li>
<li><strong>evaluate_only</strong> if not None, the task will be used only for evaluating the model for the given number of steps.
A value of 0 means that task will be evaluated for an infinite number of steps.</li>
<li><strong>parameters_server_hosts</strong> comma-separated list of hostname:port pairs to which the parameter servers are
assigned</li>
<li><strong>worker_hosts</strong> comma-separated list of hostname:port pairs to which the workers are assigned</li>
@@ -325,7 +329,7 @@ assigned</li>
<li><strong>dnd</strong> an external DND to use for NEC. This is a workaround needed for a shared DND not using the scratchpad.</li>
<li><strong>seed</strong> a seed to use for the random numbers generator</li>
<li><strong>checkpoint_save_secs</strong> the number of seconds between each checkpoint saving</li>
<li><strong>checkpoint_restore_dir</strong> the directory to restore the checkpoints from</li>
<li><strong>checkpoint_restore_path</strong> the path to restore the checkpoints from</li>
<li><strong>checkpoint_save_dir</strong> the directory to store the checkpoints in</li>
<li><strong>export_onnx_graph</strong> If set to True, this will export an onnx graph each time a checkpoint is saved</li>
<li><strong>apply_stop_condition</strong> If set to True, this will apply the stop condition defined by reaching a target success rate</li>