* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
1.8 KiB
Actions space: Discrete
References: Rainbow: Combining Improvements in Deep Reinforcement Learning
Network Structure
Algorithm Description
Rainbow combines 6 recent advancements in reinforcement learning:
N-step returns
Distributional state-action value learning
Dueling networks
Noisy Networks
Double DQN
Prioritized Experience Replay
Training the network
Sample a batch of transitions from the replay buffer.
The Bellman update is projected to the set of atoms representing the Q values distribution, such that the i − th component of the projected update is calculated as follows:
(ΦT̂Zθ(st, at))i = ∑N − 1j = 0[1 − (|[T̂zj]VMAXVMIN − zi|)/(Δz)]10 pj(st + 1, π(st + 1))
where: * [⋅] bounds its argument in the range [a, b] * T̂zj is the Bellman update for atom zj: T̂zj : = rt + γrt + 1 + ... + γrt + n − 1 + γn − 1zj
Network is trained with the cross entropy loss between the resulting probability distribution and the target probability distribution. Only the target of the actions that were actually taken is updated.
Once in every few thousand steps, weights are copied from the online network to the target network.
After every training step, the priorities of the batch transitions are updated in the prioritized replay buffer using the KL divergence loss that is returned from the network.