gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-03-18 15:53:35 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

a3c

Running trace tests in parallel + other small fixes

2018-08-30 19:35:10 +03:00

bootstrapped_dqn

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

clipped_ppo

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

ddpg

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

ddpg_her

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

dfp

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

dqn

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

dueling_ddqn

Itaicaspi/episode reset refactoring (#105 )

2018-09-04 15:07:54 +03:00

dueling_ddqn_with_per

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

qr_dqn

benchmarks and pip package updates

2018-08-19 14:23:20 +03:00

README.md

updated gifs in README + fix for multiworker crashes + improved Atari DQN and Dueling DDQN presets

2018-08-16 18:23:32 +03:00

README.md

Coach Benchmarks

The following table represents the current status of algorithms implemented in Coach relative to the results reported in the original papers. The detailed results for each algorithm can be seen by clicking on its name.

The X axis in all the figures is the total steps (for multi-threaded runs, this is the number of steps per worker). The Y axis in all the figures is the average episode reward with an averaging window of 100 timesteps.

For each algorithm, there is a command line for reproducing the results of each graph. These are the results you can expect to get when running the pre-defined presets in Coach.

The environments that were used for testing include:

Atari - Breakout, Pong and Space Invaders
Mujoco - Inverted Pendulum, Inverted Double Pendulum, Reacher, Hopper, Half Cheetah, Walker 2D, Ant, Swimmer and Humanoid.
Doom - Basic, Health Gathering (D1: Basic), Health Gathering Supreme (D2: Navigation), Battle (D3: Battle)
Fetch - Reach, Slide, Push, Pick-and-Place

Summary

Reproducing paper's results

Reproducing paper's results for some of the environments

Training but not reproducing paper's results

Not training

	Environments	Comments
DQN	Atari
Dueling DDQN	Atari
Dueling DDQN with PER	Atari
Bootstrapped DQN	Atari
QR-DQN	Atari
A3C	Atari, Mujoco
Clipped PPO	Mujoco
DDPG	Mujoco
NEC	Atari
HER	Fetch
HAC	Pendulum
DFP	Doom	Doom Battle was not verified

Click on each algorithm to see detailed benchmarking results