SAC algorithm (#282)

* SAC algorithm * SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train. gym_environment - fixing an error in access to gym.spaces * Soft Actor Critic - code cleanup * code cleanup * V-head initialization fix * SAC benchmarks * SAC Documentation * typo fix * documentation fixes * documentation and version update * README typo
2026-03-19 08:23:33 +01:00 · 2019-05-01 18:37:49 +03:00
parent 33dc29ee99
commit 74db141d5e
92 changed files with 2812 additions and 402 deletions
--- a/docs/components/additional_parameters.html
+++ b/docs/components/additional_parameters.html
@@ -276,19 +276,22 @@ of the trace tests suite.</li>
 <h2>TaskParameters<a class="headerlink" href="#taskparameters" title="Permalink to this headline">¶</a></h2>
 <dl class="class">
 <dt id="rl_coach.base_parameters.TaskParameters">
-<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">TaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks = &lt;Frameworks.tensorflow: 'TensorFlow'&gt;</em>, <em>evaluate_only: bool = False</em>, <em>use_cpu: bool = False</em>, <em>experiment_path='/tmp'</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em>, <em>num_gpu: int = 1</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#TaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.TaskParameters" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">TaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks = &lt;Frameworks.tensorflow: 'TensorFlow'&gt;</em>, <em>evaluate_only: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path='/tmp'</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_restore_path=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em>, <em>num_gpu: int = 1</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#TaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.TaskParameters" title="Permalink to this definition">¶</a></dt>
 <dd><table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
 <li><strong>framework_type</strong> – deep learning framework type. currently only tensorflow is supported</li>
-<li><strong>evaluate_only</strong> – the task will be used only for evaluating the model</li>
+<li><strong>evaluate_only</strong> – if not None, the task will be used only for evaluating the model for the given number of steps.
+A value of 0 means that task will be evaluated for an infinite number of steps.</li>
 <li><strong>use_cpu</strong> – use the cpu for this task</li>
 <li><strong>experiment_path</strong> – the path to the directory which will store all the experiment outputs</li>
 <li><strong>seed</strong> – a seed to use for the random numbers generator</li>
 <li><strong>checkpoint_save_secs</strong> – the number of seconds between each checkpoint saving</li>
-<li><strong>checkpoint_restore_dir</strong> – the directory to restore the checkpoints from</li>
+<li><strong>checkpoint_restore_dir</strong> – [DEPECRATED - will be removed in one of the next releases - switch to checkpoint_restore_path]
+the dir to restore the checkpoints from</li>
+<li><strong>checkpoint_restore_path</strong> – the path to restore the checkpoints from</li>
 <li><strong>checkpoint_save_dir</strong> – the directory to store the checkpoints in</li>
 <li><strong>export_onnx_graph</strong> – If set to True, this will export an onnx graph each time a checkpoint is saved</li>
 <li><strong>apply_stop_condition</strong> – If set to True, this will apply the stop condition defined by reaching a target success rate</li>
@@ -305,14 +308,15 @@ of the trace tests suite.</li>
 <h2>DistributedTaskParameters<a class="headerlink" href="#distributedtaskparameters" title="Permalink to this headline">¶</a></h2>
 <dl class="class">
 <dt id="rl_coach.base_parameters.DistributedTaskParameters">
-<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">DistributedTaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks</em>, <em>parameters_server_hosts: str</em>, <em>worker_hosts: str</em>, <em>job_type: str</em>, <em>task_index: int</em>, <em>evaluate_only: bool = False</em>, <em>num_tasks: int = None</em>, <em>num_training_tasks: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path=None</em>, <em>dnd=None</em>, <em>shared_memory_scratchpad=None</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_dir=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#DistributedTaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.DistributedTaskParameters" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">rl_coach.base_parameters.</code><code class="descname">DistributedTaskParameters</code><span class="sig-paren">(</span><em>framework_type: rl_coach.base_parameters.Frameworks</em>, <em>parameters_server_hosts: str</em>, <em>worker_hosts: str</em>, <em>job_type: str</em>, <em>task_index: int</em>, <em>evaluate_only: int = None</em>, <em>num_tasks: int = None</em>, <em>num_training_tasks: int = None</em>, <em>use_cpu: bool = False</em>, <em>experiment_path=None</em>, <em>dnd=None</em>, <em>shared_memory_scratchpad=None</em>, <em>seed=None</em>, <em>checkpoint_save_secs=None</em>, <em>checkpoint_restore_path=None</em>, <em>checkpoint_save_dir=None</em>, <em>export_onnx_graph: bool = False</em>, <em>apply_stop_condition: bool = False</em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/rl_coach/base_parameters.html#DistributedTaskParameters"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.base_parameters.DistributedTaskParameters" title="Permalink to this definition">¶</a></dt>
 <dd><table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
 <li><strong>framework_type</strong> – deep learning framework type. currently only tensorflow is supported</li>
-<li><strong>evaluate_only</strong> – the task will be used only for evaluating the model</li>
+<li><strong>evaluate_only</strong> – if not None, the task will be used only for evaluating the model for the given number of steps.
+A value of 0 means that task will be evaluated for an infinite number of steps.</li>
 <li><strong>parameters_server_hosts</strong> – comma-separated list of hostname:port pairs to which the parameter servers are
 assigned</li>
 <li><strong>worker_hosts</strong> – comma-separated list of hostname:port pairs to which the workers are assigned</li>
@@ -325,7 +329,7 @@ assigned</li>
 <li><strong>dnd</strong> – an external DND to use for NEC. This is a workaround needed for a shared DND not using the scratchpad.</li>
 <li><strong>seed</strong> – a seed to use for the random numbers generator</li>
 <li><strong>checkpoint_save_secs</strong> – the number of seconds between each checkpoint saving</li>
-<li><strong>checkpoint_restore_dir</strong> – the directory to restore the checkpoints from</li>
+<li><strong>checkpoint_restore_path</strong> – the path to restore the checkpoints from</li>
 <li><strong>checkpoint_save_dir</strong> – the directory to store the checkpoints in</li>
 <li><strong>export_onnx_graph</strong> – If set to True, this will export an onnx graph each time a checkpoint is saved</li>
 <li><strong>apply_stop_condition</strong> – If set to True, this will apply the stop condition defined by reaching a target success rate</li>