1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00
Files
coach/docs/_sources/components/agents/value_optimization/dueling_dqn.rst.txt
Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91)
* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
2018-11-15 15:00:13 +02:00

27 lines
1.3 KiB
ReStructuredText

Dueling DQN
===========
**Actions space:** Discrete
**References:** `Dueling Network Architectures for Deep Reinforcement Learning <https://arxiv.org/abs/1511.06581>`_
Network Structure
-----------------
.. image:: /_static/img/design_imgs/dueling_dqn.png
:align: center
General Description
-------------------
Dueling DQN presents a change in the network structure comparing to DQN.
Dueling DQN uses a specialized *Dueling Q Head* in order to separate :math:`Q` to an :math:`A` (advantage)
stream and a :math:`V` stream. Adding this type of structure to the network head allows the network to better differentiate
actions from one another, and significantly improves the learning.
In many states, the values of the different actions are very similar, and it is less important which action to take.
This is especially important in environments where there are many actions to choose from. In DQN, on each training
iteration, for each of the states in the batch, we update the :ath:`Q` values only for the specific actions taken in
those states. This results in slower learning as we do not learn the :math:`Q` values for actions that were not taken yet.
On dueling architecture, on the other hand, learning is faster - as we start learning the state-value even if only a
single action has been taken at this state.