1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 11:40:18 +01:00
Files
coach/docs/_sources/components/agents/value_optimization/naf.rst.txt
Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91)
* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
2018-11-15 15:00:13 +02:00

34 lines
1.2 KiB
ReStructuredText

Normalized Advantage Functions
==============================
**Actions space:** Continuous
**References:** `Continuous Deep Q-Learning with Model-based Acceleration <https://arxiv.org/abs/1603.00748.pdf>`_
Network Structure
-----------------
.. image:: /_static/img/design_imgs/naf.png
:width: 600px
:align: center
Algorithm Description
---------------------
Choosing an action
++++++++++++++++++
The current state is used as an input to the network. The action mean :math:`\mu(s_t )` is extracted from the output head.
It is then passed to the exploration policy which adds noise in order to encourage exploration.
Training the network
++++++++++++++++++++
The network is trained by using the following targets:
:math:`y_t=r(s_t,a_t )+\gamma\cdot V(s_{t+1})`
Use the next states as the inputs to the target network and extract the :math:`V` value, from within the head,
to get :math:`V(s_{t+1} )`. Then, update the online network using the current states and actions as inputs,
and :math:`y_t` as the targets.
After every training step, use a soft update in order to copy the weights from the online network to the target network.
.. autoclass:: rl_coach.agents.naf_agent.NAFAlgorithmParameters