coach/docs_raw/source/components/agents/value_optimization/mmc.rst at ddffac8570e88bd0adf3591bb29a08578a0c1383

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2025-12-18 11:40:18 +01:00

Files

Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91 )

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation

2018-11-15 15:00:13 +02:00

1.2 KiB

Raw Blame History

Actions space: Discrete

References: Count-Based Exploration with Neural Density Models

Network Structure

Algorithm Description

Training the network

In MMC, targets are calculated as a mixture between Double DQN targets and full Monte Carlo samples (total discounted returns).

The DDQN targets are calculated in the same manner as in the DDQN agent:

y^DDQN_t = r(s_t, a_t) + γQ(s_t + 1, argmax_aQ(s_t + 1, a))

The Monte Carlo targets are calculated by summing up the discounted rewards across the entire episode:

y^MC_t = ∑^T_j = 0γ^jr(s_t + j, a_t + j)

A mixing ratio $alpha$ is then used to get the final targets:

y_t = (1 − α)⋅y^DDQN_t + α⋅y^MC_t

Finally, the online network is trained using the current states as inputs, and the calculated targets. Once in every few thousand steps, copy the weights from the online network to the target network.

System Message: ERROR/3 (<string>, line 37)

Unknown directive type "autoclass".

.. autoclass:: rl_coach.agents.mmc_agent.MixedMonteCarloAlgorithmParameters

1.2 KiB Raw Blame History Unescape Escape

Network Structure

Algorithm Description

Training the network

1.2 KiB

Raw Blame History