coach/docs_raw/source/components/agents/value_optimization/n_step.rst at 92460736bc24b3480eaf906ac5ce088852f704fd

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2025-12-18 19:50:17 +01:00

Files

Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91 )

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation

2018-11-15 15:00:13 +02:00

1.2 KiB

Raw Blame History

Actions space: Discrete

References: Asynchronous Methods for Deep Reinforcement Learning

Network Structure

Algorithm Description

Training the network

The N-step Q learning algorithm works in similar manner to DQN except for the following changes:

No replay buffer is used. Instead of sampling random batches of transitions, the network is trained every N steps using the latest N steps played by the agent.
In order to stabilize the learning, multiple workers work together to update the network. This creates the same effect as uncorrelating the samples used for training.
Instead of using single-step Q targets for the network, the rewards from $N$ consequent steps are accumulated to form the N-step Q targets, according to the following equation: R(s_t, a_t) = ∑^{i = t + k − 1}_i = tγ^i − tr_i + γ^kV(s_t + k) where k is T_max − State_Index for each state in the batch

System Message: ERROR/3 (<string>, line 35)

Unknown directive type "autoclass".

.. autoclass:: rl_coach.agents.n_step_q_agent.NStepQAlgorithmParameters

1.2 KiB Raw Blame History Unescape Escape

Network Structure

Algorithm Description

Training the network

1.2 KiB

Raw Blame History