1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 19:50:17 +01:00
Files
coach/docs_raw/source/components/agents/value_optimization/qr_dqn.rst
Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91)
* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
2018-11-15 15:00:13 +02:00

1.1 KiB

Actions space: Discrete

References: Distributional Reinforcement Learning with Quantile Regression

Network Structure

/_static/img/design_imgs/qr_dqn.png

Algorithm Description

Training the network

  1. Sample a batch of transitions from the replay buffer.

  2. First, the next state quantiles are predicted. These are used in order to calculate the targets for the network, by following the Bellman equation. Next, the current quantile locations for the current states are predicted, sorted, and used for calculating the quantile midpoints targets.

  3. The network is trained with the quantile regression loss between the resulting quantile locations and the target quantile locations. Only the targets of the actions that were actually taken are updated.

  4. Once in every few thousand steps, weights are copied from the online network to the target network.