1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
This commit is contained in:
Gal Leibovich
2019-06-16 11:11:21 +03:00
committed by GitHub
parent 8df3c46756
commit 7eb884c5b2
107 changed files with 2200 additions and 495 deletions

View File

@@ -391,6 +391,16 @@ $(document).ready(function() {
and therefore it is able to use a replay buffer in order to improve sample efficiency.
</span>
</div>
<div class="algorithm continuous off-policy" data-year="201509">
<span class="badge">
<a href="components/agents/policy_optimization/td3.html">TD3</a>
<br>
Very similar to DDPG, i.e. an actor-critic for continuous action spaces, that uses a replay buffer in
order to improve sample efficiency. TD3 uses two critic networks in order to mitigate the overestimation
in the Q state-action value prediction, slows down the actor updates in order to increase stability and
adds noise to actions while training the critic in order to smooth out the critic's predictions.
</span>
</div>
<div class="algorithm continuous discrete on-policy" data-year="201706">
<span class="badge">
<a href="components/agents/policy_optimization/ppo.html">PPO</a>