moving the docs to github

2026-02-02 05:45:45 +01:00 · 2018-04-23 09:14:20 +03:00
parent cafa152382
commit 5d5562bf62
118 changed files with 10792 additions and 3 deletions
--- a/docs_raw/docs/algorithms/value_optimization/mmc.md
+++ b/docs_raw/docs/algorithms/value_optimization/mmc.md
@@ -0,0 +1,32 @@
+# Mixed Monte Carlo
+
+**Actions space:** Discrete
+
+**References:** [Count-Based Exploration with Neural Density Models](https://arxiv.org/abs/1703.01310)
+
+## Network Structure
+
+<p style="text-align: center;">
+
+<img src="../../design_imgs/dqn.png">
+
+</p>
+
+## Algorithm Description
+### Training the network
+In MMC, targets are calculated as a mixture between Double DQN targets and full Monte Carlo samples (total discounted returns).
+
+The DDQN targets are calculated in the same manner as in the DDQN agent:
+
+$$ y_t^{DDQN}=r(s_t,a_t )+\gamma Q(s_{t+1},argmax_a Q(s_{t+1},a)) $$
+
+The Monte Carlo targets are calculated by summing up the discounted rewards across the entire episode:
+
+$$ y_t^{MC}=\sum_{j=0}^T\gamma^j r(s_{t+j},a_{t+j} ) $$
+
+A mixing ratio $\alpha$ is then used to get the final targets:
+
+$$ y_t=(1-\alpha)\cdot y_t^{DDQN}+\alpha \cdot y_t^{MC} $$ 
+
+Finally, the online network is trained using the current states as inputs, and the calculated targets.
+Once in every few thousand steps, copy the weights from the online network to the target network.