update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-02-19 07:55:49 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/docs_raw/source/components/agents/value_optimization/n_step.rst
+++ b/docs_raw/source/components/agents/value_optimization/n_step.rst
@@ -0,0 +1,35 @@
+N-Step Q Learning
+=================
+
+**Actions space:** Discrete
+
+**References:** `Asynchronous Methods for Deep Reinforcement Learning <https://arxiv.org/abs/1602.01783>`_
+
+Network Structure
+-----------------
+
+.. image:: /_static/img/design_imgs/dqn.png
+   :align: center
+
+Algorithm Description
+---------------------
+
+Training the network
++++++++++++++++++++
+
+The :math:`N`-step Q learning algorithm works in similar manner to DQN except for the following changes:
+
+1. No replay buffer is used. Instead of sampling random batches of transitions, the network is trained every
+   :math:`N` steps using the latest :math:`N` steps played by the agent.
+
+2. In order to stabilize the learning, multiple workers work together to update the network.
+   This creates the same effect as uncorrelating the samples used for training.
+
+3. Instead of using single-step Q targets for the network, the rewards from $N$ consequent steps are accumulated
+   to form the :math:`N`-step Q targets, according to the following equation:
+   :math:`R(s_t, a_t) = \sum_{i=t}^{i=t + k - 1} \gamma^{i-t}r_i +\gamma^{k} V(s_{t+k})`
+   where :math:`k` is :math:`T_{max} - State\_Index` for each state in the batch
+
+
+
+.. autoclass:: rl_coach.agents.n_step_q_agent.NStepQAlgorithmParameters