1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/benchmarks
guyk1971 74db141d5e SAC algorithm (#282)
* SAC algorithm

* SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train.
gym_environment - fixing an error in access to gym.spaces

* Soft Actor Critic - code cleanup

* code cleanup

* V-head initialization fix

* SAC benchmarks

* SAC Documentation

* typo fix

* documentation fixes

* documentation and version update

* README typo
2019-05-01 18:37:49 +03:00
..
2019-03-17 15:33:28 +02:00
2019-05-01 18:37:49 +03:00
2019-05-01 18:37:49 +03:00
2018-08-19 14:23:20 +03:00
2018-08-19 14:23:20 +03:00
2018-08-19 14:23:20 +03:00
2019-05-01 18:37:49 +03:00
2019-05-01 18:37:49 +03:00

Coach Benchmarks

The following table represents the current status of algorithms implemented in Coach relative to the results reported in the original papers. The detailed results for each algorithm can be seen by clicking on its name.

The X axis in all the figures is the total steps (for multi-threaded runs, this is the number of steps per worker). The Y axis in all the figures is the average episode reward with an averaging window of 100 timesteps.

For each algorithm, there is a command line for reproducing the results of each graph. These are the results you can expect to get when running the pre-defined presets in Coach.

The environments that were used for testing include:

  • Atari - Breakout, Pong and Space Invaders
  • Mujoco - Inverted Pendulum, Inverted Double Pendulum, Reacher, Hopper, Half Cheetah, Walker 2D, Ant, Swimmer and Humanoid.
  • Doom - Basic, Health Gathering (D1: Basic), Health Gathering Supreme (D2: Navigation), Battle (D3: Battle)
  • Fetch - Reach, Slide, Push, Pick-and-Place

Summary

#2E8B57 Reproducing paper's results

#ceffad Reproducing paper's results for some of the environments

#FFA500 Training but not reproducing paper's results

#FF4040 Not training

Status Environments Comments
DQN #2E8B57 Atari
Dueling DDQN #2E8B57 Atari
Dueling DDQN with PER #2E8B57 Atari
Bootstrapped DQN #2E8B57 Atari
QR-DQN #2E8B57 Atari
A3C #2E8B57 Atari, Mujoco
ACER #2E8B57 Atari
Clipped PPO #2E8B57 Mujoco
DDPG #2E8B57 Mujoco
SAC #2E8B57 Mujoco
NEC #2E8B57 Atari
HER #2E8B57 Fetch
DFP #ceffad Doom Doom Battle was not verified

Click on each algorithm to see detailed benchmarking results