mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 11:10:20 +01:00
TD3 (#338)
This commit is contained in:
@@ -391,6 +391,16 @@ $(document).ready(function() {
|
||||
and therefore it is able to use a replay buffer in order to improve sample efficiency.
|
||||
</span>
|
||||
</div>
|
||||
<div class="algorithm continuous off-policy" data-year="201509">
|
||||
<span class="badge">
|
||||
<a href="components/agents/policy_optimization/td3.html">TD3</a>
|
||||
<br>
|
||||
Very similar to DDPG, i.e. an actor-critic for continuous action spaces, that uses a replay buffer in
|
||||
order to improve sample efficiency. TD3 uses two critic networks in order to mitigate the overestimation
|
||||
in the Q state-action value prediction, slows down the actor updates in order to increase stability and
|
||||
adds noise to actions while training the critic in order to smooth out the critic's predictions.
|
||||
</span>
|
||||
</div>
|
||||
<div class="algorithm continuous discrete on-policy" data-year="201706">
|
||||
<span class="badge">
|
||||
<a href="components/agents/policy_optimization/ppo.html">PPO</a>
|
||||
|
||||
Reference in New Issue
Block a user