update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-03-17 23:33:37 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/docs_raw/source/components/agents/value_optimization/dueling_dqn.rst
+++ b/docs_raw/source/components/agents/value_optimization/dueling_dqn.rst
@@ -0,0 +1,27 @@
+Dueling DQN
+===========
+
+**Actions space:** Discrete
+
+**References:** `Dueling Network Architectures for Deep Reinforcement Learning <https://arxiv.org/abs/1511.06581>`_
+
+Network Structure
+-----------------
+
+.. image:: /_static/img/design_imgs/dueling_dqn.png
+   :align: center
+
+General Description
+-------------------
+Dueling DQN presents a change in the network structure comparing to DQN.
+
+Dueling DQN uses a specialized *Dueling Q Head* in order to separate :math:`Q` to an :math:`A` (advantage)
+stream and a :math:`V` stream. Adding this type of structure to the network head allows the network to better differentiate
+actions from one another, and significantly improves the learning.
+
+In many states, the values of the different actions are very similar, and it is less important which action to take.
+This is especially important in environments where there are many actions to choose from. In DQN, on each training
+iteration, for each of the states in the batch, we update the :ath:`Q` values only for the specific actions taken in
+those states. This results in slower learning as we do not learn the :math:`Q` values for actions that were not taken yet.
+On dueling architecture, on the other hand, learning is faster - as we start learning the state-value even if only a
+single action has been taken at this state.