coach/docs_raw/source/components/agents/value_optimization/qr_dqn.rst at 2697142d5a483da7fce651624005eb29286cc101

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2025-12-18 19:50:17 +01:00

Files

Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91 )

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation

2018-11-15 15:00:13 +02:00

1.1 KiB

Raw Blame History

Actions space: Discrete

References: Distributional Reinforcement Learning with Quantile Regression

Network Structure

Algorithm Description

Training the network

Sample a batch of transitions from the replay buffer.
First, the next state quantiles are predicted. These are used in order to calculate the targets for the network, by following the Bellman equation. Next, the current quantile locations for the current states are predicted, sorted, and used for calculating the quantile midpoints targets.
The network is trained with the quantile regression loss between the resulting quantile locations and the target quantile locations. Only the targets of the actions that were actually taken are updated.
Once in every few thousand steps, weights are copied from the online network to the target network.

System Message: ERROR/3 (<string>, line 33)

Unknown directive type "autoclass".

.. autoclass:: rl_coach.agents.qr_dqn_agent.QuantileRegressionDQNAlgorithmParameters

1.1 KiB Raw Blame History

Network Structure

Algorithm Description

Training the network

1.1 KiB

Raw Blame History