mirror of
https://github.com/gryf/coach.git
synced 2026-02-01 21:35:45 +01:00
TD3 (#338)
This commit is contained in:
101
docs/test.html
101
docs/test.html
@@ -190,10 +190,10 @@
|
||||
</div>
|
||||
<dl class="class">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent">
|
||||
<em class="property">class </em><code class="descclassname">rl_coach.agents.dqn_agent.</code><code class="descname">DQNAgent</code><span class="sig-paren">(</span><em>agent_parameters</em>, <em>parent: Union[LevelManager</em>, <em>CompositeAgent] = None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/rl_coach/agents/dqn_agent.html#DQNAgent"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property">class </em><code class="sig-prename descclassname">rl_coach.agents.dqn_agent.</code><code class="sig-name descname">DQNAgent</code><span class="sig-paren">(</span><em class="sig-param">agent_parameters</em>, <em class="sig-param">parent: Union[LevelManager</em>, <em class="sig-param">CompositeAgent] = None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/rl_coach/agents/dqn_agent.html#DQNAgent"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.act">
|
||||
<code class="descname">act</code><span class="sig-paren">(</span><em>action: Union[None</em>, <em>int</em>, <em>float</em>, <em>numpy.ndarray</em>, <em>List] = None</em><span class="sig-paren">)</span> → rl_coach.core_types.ActionInfo<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.act" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">act</code><span class="sig-paren">(</span><em class="sig-param">action: Union[None</em>, <em class="sig-param">int</em>, <em class="sig-param">float</em>, <em class="sig-param">numpy.ndarray</em>, <em class="sig-param">List] = None</em><span class="sig-paren">)</span> → rl_coach.core_types.ActionInfo<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.act" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Given the agents current knowledge, decide on the next action to apply to the environment</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -207,7 +207,7 @@
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.call_memory">
|
||||
<code class="descname">call_memory</code><span class="sig-paren">(</span><em>func</em>, <em>args=()</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.call_memory" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">call_memory</code><span class="sig-paren">(</span><em class="sig-param">func</em>, <em class="sig-param">args=()</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.call_memory" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>This function is a wrapper to allow having the same calls for shared or unshared memories.
|
||||
It should be used instead of calling the memory directly in order to allow different algorithms to work
|
||||
both with a shared and a local memory.</p>
|
||||
@@ -226,7 +226,7 @@ both with a shared and a local memory.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.choose_action">
|
||||
<code class="descname">choose_action</code><span class="sig-paren">(</span><em>curr_state</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.choose_action" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">choose_action</code><span class="sig-paren">(</span><em class="sig-param">curr_state</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.choose_action" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>choose an action to act with in the current episode being played. Different behavior might be exhibited when
|
||||
training or testing.</p>
|
||||
<dl class="field-list simple">
|
||||
@@ -241,7 +241,7 @@ training or testing.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.collect_savers">
|
||||
<code class="descname">collect_savers</code><span class="sig-paren">(</span><em>parent_path_suffix: str</em><span class="sig-paren">)</span> → rl_coach.saver.SaverCollection<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.collect_savers" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">collect_savers</code><span class="sig-paren">(</span><em class="sig-param">parent_path_suffix: str</em><span class="sig-paren">)</span> → rl_coach.saver.SaverCollection<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.collect_savers" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Collect all of agent’s network savers
|
||||
:param parent_path_suffix: path suffix of the parent of the agent
|
||||
(could be name of level manager or composite agent)
|
||||
@@ -250,7 +250,7 @@ training or testing.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.create_networks">
|
||||
<code class="descname">create_networks</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → Dict[str, rl_coach.architectures.network_wrapper.NetworkWrapper]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.create_networks" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">create_networks</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → Dict[str, rl_coach.architectures.network_wrapper.NetworkWrapper]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.create_networks" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Create all the networks of the agent.
|
||||
The network creation will be done after setting the environment parameters for the agent, since they are needed
|
||||
for creating the network.</p>
|
||||
@@ -261,9 +261,16 @@ for creating the network.</p>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.freeze_memory">
|
||||
<code class="sig-name descname">freeze_memory</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.freeze_memory" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Shuffle episodes in the memory and freeze it to make sure that no extra data is being pushed anymore.
|
||||
:return: None</p>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.get_predictions">
|
||||
<code class="descname">get_predictions</code><span class="sig-paren">(</span><em>states: List[Dict[str, numpy.ndarray]], prediction_type: rl_coach.core_types.PredictionType</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.get_predictions" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">get_predictions</code><span class="sig-paren">(</span><em class="sig-param">states: List[Dict[str, numpy.ndarray]], prediction_type: rl_coach.core_types.PredictionType</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.get_predictions" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get a prediction from the agent with regard to the requested prediction_type.
|
||||
If the agent cannot predict this type of prediction_type, or if there is more than possible way to do so,
|
||||
raise a ValueException.</p>
|
||||
@@ -282,7 +289,7 @@ raise a ValueException.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.get_state_embedding">
|
||||
<code class="descname">get_state_embedding</code><span class="sig-paren">(</span><em>state: dict</em><span class="sig-paren">)</span> → numpy.ndarray<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.get_state_embedding" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">get_state_embedding</code><span class="sig-paren">(</span><em class="sig-param">state: dict</em><span class="sig-paren">)</span> → numpy.ndarray<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.get_state_embedding" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Given a state, get the corresponding state embedding from the main network</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -296,7 +303,7 @@ raise a ValueException.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.handle_episode_ended">
|
||||
<code class="descname">handle_episode_ended</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.handle_episode_ended" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">handle_episode_ended</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.handle_episode_ended" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Make any changes needed when each episode is ended.
|
||||
This includes incrementing counters, updating full episode dependent values, updating logs, etc.
|
||||
This function is called right after each episode is ended.</p>
|
||||
@@ -309,7 +316,7 @@ This function is called right after each episode is ended.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.improve_reward_model">
|
||||
<code class="descname">improve_reward_model</code><span class="sig-paren">(</span><em>epochs: int</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.improve_reward_model" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">improve_reward_model</code><span class="sig-paren">(</span><em class="sig-param">epochs: int</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.improve_reward_model" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Train a reward model to be used by the doubly-robust estimator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -323,7 +330,7 @@ This function is called right after each episode is ended.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.init_environment_dependent_modules">
|
||||
<code class="descname">init_environment_dependent_modules</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.init_environment_dependent_modules" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">init_environment_dependent_modules</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.init_environment_dependent_modules" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Initialize any modules that depend on knowing information about the environment such as the action space or
|
||||
the observation space</p>
|
||||
<dl class="field-list simple">
|
||||
@@ -333,9 +340,20 @@ the observation space</p>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.initialize_session_dependent_components">
|
||||
<code class="sig-name descname">initialize_session_dependent_components</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.initialize_session_dependent_components" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Initialize components which require a session as part of their initialization.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dd class="field-odd"><p>None</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.learn_from_batch">
|
||||
<code class="descname">learn_from_batch</code><span class="sig-paren">(</span><em>batch</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/rl_coach/agents/dqn_agent.html#DQNAgent.learn_from_batch"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.learn_from_batch" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">learn_from_batch</code><span class="sig-paren">(</span><em class="sig-param">batch</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/rl_coach/agents/dqn_agent.html#DQNAgent.learn_from_batch"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.learn_from_batch" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Given a batch of transitions, calculates their target values and updates the network.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -347,9 +365,20 @@ the observation space</p>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.load_memory_from_file">
|
||||
<code class="sig-name descname">load_memory_from_file</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.load_memory_from_file" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Load memory transitions from a file.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dd class="field-odd"><p>None</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.log_to_screen">
|
||||
<code class="descname">log_to_screen</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.log_to_screen" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">log_to_screen</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.log_to_screen" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Write an episode summary line to the terminal</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -360,7 +389,7 @@ the observation space</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.observe">
|
||||
<code class="descname">observe</code><span class="sig-paren">(</span><em>env_response: rl_coach.core_types.EnvResponse</em><span class="sig-paren">)</span> → bool<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.observe" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">observe</code><span class="sig-paren">(</span><em class="sig-param">env_response: rl_coach.core_types.EnvResponse</em><span class="sig-paren">)</span> → bool<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.observe" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Given a response from the environment, distill the observation from it and store it for later use.
|
||||
The response should be a dictionary containing the performed action, the new observation and measurements,
|
||||
the reward, a game over flag and any additional information necessary.</p>
|
||||
@@ -375,9 +404,9 @@ given observation</p>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="attribute">
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.parent">
|
||||
<code class="descname">parent</code><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.parent" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property">property </em><code class="sig-name descname">parent</code><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.parent" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get the parent class of the agent</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -386,9 +415,9 @@ given observation</p>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="attribute">
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.phase">
|
||||
<code class="descname">phase</code><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.phase" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property">property </em><code class="sig-name descname">phase</code><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.phase" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>The current running phase of the agent</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -399,7 +428,7 @@ given observation</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.post_training_commands">
|
||||
<code class="descname">post_training_commands</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.post_training_commands" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">post_training_commands</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.post_training_commands" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>A function which allows adding any functionality that is required to run right after the training phase ends.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -410,7 +439,7 @@ given observation</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference">
|
||||
<code class="descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em>states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">prepare_batch_for_inference</code><span class="sig-paren">(</span><em class="sig-param">states: Union[Dict[str, numpy.ndarray], List[Dict[str, numpy.ndarray]]], network_name: str</em><span class="sig-paren">)</span> → Dict[str, numpy.core.multiarray.array]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.prepare_batch_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Convert curr_state into input tensors tensorflow is expecting. i.e. if we have several inputs states, stack all
|
||||
observations together, measurements together, etc.</p>
|
||||
<dl class="field-list simple">
|
||||
@@ -430,7 +459,7 @@ the observation relevant for the network from the states.</p></li>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.register_signal">
|
||||
<code class="descname">register_signal</code><span class="sig-paren">(</span><em>signal_name: str</em>, <em>dump_one_value_per_episode: bool = True</em>, <em>dump_one_value_per_step: bool = False</em><span class="sig-paren">)</span> → rl_coach.utils.Signal<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.register_signal" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">register_signal</code><span class="sig-paren">(</span><em class="sig-param">signal_name: str</em>, <em class="sig-param">dump_one_value_per_episode: bool = True</em>, <em class="sig-param">dump_one_value_per_step: bool = False</em><span class="sig-paren">)</span> → rl_coach.utils.Signal<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.register_signal" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Register a signal such that its statistics will be dumped and be viewable through dashboard</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -448,7 +477,7 @@ the observation relevant for the network from the states.</p></li>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.reset_evaluation_state">
|
||||
<code class="descname">reset_evaluation_state</code><span class="sig-paren">(</span><em>val: rl_coach.core_types.RunPhase</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.reset_evaluation_state" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">reset_evaluation_state</code><span class="sig-paren">(</span><em class="sig-param">val: rl_coach.core_types.RunPhase</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.reset_evaluation_state" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Perform accumulators initialization when entering an evaluation phase, and signal dumping when exiting an
|
||||
evaluation phase. Entering or exiting the evaluation phase is determined according to the new phase given
|
||||
by val, and by the current phase set in self.phase.</p>
|
||||
@@ -464,7 +493,7 @@ by val, and by the current phase set in self.phase.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.reset_internal_state">
|
||||
<code class="descname">reset_internal_state</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.reset_internal_state" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">reset_internal_state</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.reset_internal_state" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Reset all the episodic parameters. This function is called right before each episode starts.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -475,7 +504,7 @@ by val, and by the current phase set in self.phase.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.restore_checkpoint">
|
||||
<code class="descname">restore_checkpoint</code><span class="sig-paren">(</span><em>checkpoint_dir: str</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.restore_checkpoint" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">restore_checkpoint</code><span class="sig-paren">(</span><em class="sig-param">checkpoint_dir: str</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.restore_checkpoint" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Allows agents to store additional information when saving checkpoints.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -489,7 +518,7 @@ by val, and by the current phase set in self.phase.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.run_off_policy_evaluation">
|
||||
<code class="descname">run_off_policy_evaluation</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.run_off_policy_evaluation" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">run_off_policy_evaluation</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.run_off_policy_evaluation" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Run the off-policy evaluation estimators to get a prediction for the performance of the current policy based on
|
||||
an evaluation dataset, which was collected by another policy(ies).
|
||||
:return: None</p>
|
||||
@@ -497,7 +526,7 @@ an evaluation dataset, which was collected by another policy(ies).
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.run_pre_network_filter_for_inference">
|
||||
<code class="descname">run_pre_network_filter_for_inference</code><span class="sig-paren">(</span><em>state: Dict[str, numpy.ndarray], update_filter_internal_state: bool = True</em><span class="sig-paren">)</span> → Dict[str, numpy.ndarray]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.run_pre_network_filter_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">run_pre_network_filter_for_inference</code><span class="sig-paren">(</span><em class="sig-param">state: Dict[str, numpy.ndarray], update_filter_internal_state: bool = True</em><span class="sig-paren">)</span> → Dict[str, numpy.ndarray]<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.run_pre_network_filter_for_inference" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Run filters which where defined for being applied right before using the state for inference.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -514,7 +543,7 @@ an evaluation dataset, which was collected by another policy(ies).
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.save_checkpoint">
|
||||
<code class="descname">save_checkpoint</code><span class="sig-paren">(</span><em>checkpoint_prefix: str</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.save_checkpoint" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">save_checkpoint</code><span class="sig-paren">(</span><em class="sig-param">checkpoint_prefix: str</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.save_checkpoint" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Allows agents to store additional information when saving checkpoints.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
@@ -528,7 +557,7 @@ an evaluation dataset, which was collected by another policy(ies).
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.set_environment_parameters">
|
||||
<code class="descname">set_environment_parameters</code><span class="sig-paren">(</span><em>spaces: rl_coach.spaces.SpacesDefinition</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_environment_parameters" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">set_environment_parameters</code><span class="sig-paren">(</span><em class="sig-param">spaces: rl_coach.spaces.SpacesDefinition</em><span class="sig-paren">)</span><a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_environment_parameters" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Sets the parameters that are environment dependent. As a side effect, initializes all the components that are
|
||||
dependent on those values, by calling init_environment_dependent_modules</p>
|
||||
<dl class="field-list simple">
|
||||
@@ -543,7 +572,7 @@ dependent on those values, by calling init_environment_dependent_modules</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.set_incoming_directive">
|
||||
<code class="descname">set_incoming_directive</code><span class="sig-paren">(</span><em>action: Union[int, float, numpy.ndarray, List]</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_incoming_directive" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">set_incoming_directive</code><span class="sig-paren">(</span><em class="sig-param">action: Union[int, float, numpy.ndarray, List]</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_incoming_directive" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Allows setting a directive for the agent to follow. This is useful in hierarchy structures, where the agent
|
||||
has another master agent that is controlling it. In such cases, the master agent can define the goals for the
|
||||
slave agent, define it’s observation, possible actions, etc. The directive type is defined by the agent
|
||||
@@ -560,7 +589,7 @@ in-action-space.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.set_session">
|
||||
<code class="descname">set_session</code><span class="sig-paren">(</span><em>sess</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_session" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">set_session</code><span class="sig-paren">(</span><em class="sig-param">sess</em><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.set_session" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Set the deep learning framework session for all the agents in the composite agent</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -571,7 +600,7 @@ in-action-space.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.setup_logger">
|
||||
<code class="descname">setup_logger</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.setup_logger" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">setup_logger</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.setup_logger" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Setup the logger for the agent</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -582,7 +611,7 @@ in-action-space.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.sync">
|
||||
<code class="descname">sync</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.sync" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">sync</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.sync" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Sync the global network parameters to local networks</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -593,7 +622,7 @@ in-action-space.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.train">
|
||||
<code class="descname">train</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → float<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.train" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">train</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → float<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.train" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Check if a training phase should be done as configured by num_consecutive_playing_steps.
|
||||
If it should, then do several training steps as configured by num_consecutive_training_steps.
|
||||
A single training iteration: Sample a batch, train on it and update target networks.</p>
|
||||
@@ -606,7 +635,7 @@ A single training iteration: Sample a batch, train on it and update target netwo
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.update_log">
|
||||
<code class="descname">update_log</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_log" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">update_log</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_log" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Updates the episodic log file with all the signal values from the most recent episode.
|
||||
Additional signals for logging can be set by the creating a new signal using self.register_signal,
|
||||
and then updating it with some internal agent values.</p>
|
||||
@@ -619,7 +648,7 @@ and then updating it with some internal agent values.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.update_step_in_episode_log">
|
||||
<code class="descname">update_step_in_episode_log</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_step_in_episode_log" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">update_step_in_episode_log</code><span class="sig-paren">(</span><span class="sig-paren">)</span> → None<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_step_in_episode_log" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Updates the in-episode log file with all the signal values from the most recent step.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
@@ -630,7 +659,7 @@ and then updating it with some internal agent values.</p>
|
||||
|
||||
<dl class="method">
|
||||
<dt id="rl_coach.agents.dqn_agent.DQNAgent.update_transition_before_adding_to_replay_buffer">
|
||||
<code class="descname">update_transition_before_adding_to_replay_buffer</code><span class="sig-paren">(</span><em>transition: rl_coach.core_types.Transition</em><span class="sig-paren">)</span> → rl_coach.core_types.Transition<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_transition_before_adding_to_replay_buffer" title="Permalink to this definition">¶</a></dt>
|
||||
<code class="sig-name descname">update_transition_before_adding_to_replay_buffer</code><span class="sig-paren">(</span><em class="sig-param">transition: rl_coach.core_types.Transition</em><span class="sig-paren">)</span> → rl_coach.core_types.Transition<a class="headerlink" href="#rl_coach.agents.dqn_agent.DQNAgent.update_transition_before_adding_to_replay_buffer" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Allows agents to update the transition just before adding it to the replay buffer.
|
||||
Can be useful for agents that want to tweak the reward, termination signal, etc.</p>
|
||||
<dl class="field-list simple">
|
||||
|
||||
Reference in New Issue
Block a user