update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-07-09 02:46:33 +02:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
@@ -0,0 +1,37 @@
+Mixed Monte Carlo
+=================
+
+**Actions space:** Discrete
+
+**References:** `Count-Based Exploration with Neural Density Models <https://arxiv.org/abs/1703.01310>`_
+
+Network Structure
+-----------------
+
+.. image:: /_static/img/design_imgs/dqn.png
+   :align: center
+
+Algorithm Description
+---------------------
+Training the network
++++++++++++++++++++
+
+In MMC, targets are calculated as a mixture between Double DQN targets and full Monte Carlo samples (total discounted returns).
+
+The DDQN targets are calculated in the same manner as in the DDQN agent:
+
+:math:`y_t^{DDQN}=r(s_t,a_t )+\gamma Q(s_{t+1},argmax_a Q(s_{t+1},a))`
+
+The Monte Carlo targets are calculated by summing up the discounted rewards across the entire episode:
+
+:math:`y_t^{MC}=\sum_{j=0}^T\gamma^j r(s_{t+j},a_{t+j} )`
+
+A mixing ratio $\alpha$ is then used to get the final targets:
+
+:math:`y_t=(1-\alpha)\cdot y_t^{DDQN}+\alpha \cdot y_t^{MC}`
+
+Finally, the online network is trained using the current states as inputs, and the calculated targets.
+Once in every few thousand steps, copy the weights from the online network to the target network.
+
+
+.. autoclass:: rl_coach.agents.mmc_agent.MixedMonteCarloAlgorithmParameters