1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/benchmarks/README.md
Itai Caspi a7206ed702 Multiple improvements and bug fixes (#66)
* Multiple improvements and bug fixes:

    * Using lazy stacking to save on memory when using a replay buffer
    * Remove step counting for evaluation episodes
    * Reset game between heatup and training
    * Major bug fixes in NEC (is reproducing the paper results for pong now)
    * Image input rescaling to 0-1 is now optional
    * Change the terminal title to be the experiment name
    * Observation cropping for atari is now optional
    * Added random number of noop actions for gym to match the dqn paper
    * Fixed a bug where the evaluation episodes won't start with the max possible ale lives
    * Added a script for plotting the results of an experiment over all the atari games
2018-02-26 12:29:07 +02:00

3.4 KiB

Coach Benchmarks

The following figures are training curves of some of the presets available through Coach. The X axis in all the figures is the total steps (for multi-threaded runs, this is the accumulated number of steps over all the workers). The Y axis in all the figures is the average episode reward with an averaging window of 11 episodes. These are the results you can expect to get when running the pre-defined presets in Coach.

A3C

Breakout_A3C with 16 workers

python3 coach.py -p Breakout_A3C -n 16 -r
Breakout_A3C_16_workers

InvertedPendulum_A3C with 16 workers

python3 coach.py -p InvertedPendulum_A3C -n 16 -r
Inverted_Pendulum_A3C_16_workers

Hopper_A3C with 16 workers

python3 coach.py -p Hopper_A3C -n 16 -r
Hopper_A3C_16_workers

Ant_A3C with 16 workers

python3 coach.py -p Ant_A3C -n 16 -r
Ant_A3C_16_workers

Clipped PPO

InvertedPendulum_ClippedPPO with 16 workers

python3 coach.py -p InvertedPendulum_ClippedPPO -n 16 -r
InvertedPendulum_ClippedPPO_16_workers

Hopper_ClippedPPO with 16 workers

python3 coach.py -p Hopper_ClippedPPO -n 16 -r
Hopper_Clipped_PPO_16_workers

Humanoid_ClippedPPO with 16 workers

python3 coach.py -p Humanoid_ClippedPPO -n 16 -r
Humanoid_ClippedPPO_16_workers

DQN

Pong_DQN

python3 coach.py -p Pong_DQN -r
Pong_DQN

Doom_Basic_DQN

python3 coach.py -p Doom_Basic_DQN -r
Doom_Basic_DQN

Dueling DDQN

Doom_Basic_Dueling_DDQN

python3 coach.py -p Doom_Basic_Dueling_DDQN -r
Doom_Basic_Dueling_DDQN

DFP

Doom_Health_DFP

python3 coach.py -p Doom_Health_DFP -r
Doom_Health_DFP

MMC

Doom_Health_MMC

python3 coach.py -p Doom_Health_MMC -r
Doom_Health_MMC

NEC

Pong_NEC

python3 coach.py -p Pong_NEC -r
Pong_NEC

Doom_Basic_NEC

python3 coach.py -p Doom_Basic_NEC -r
Doom_Basic_NEC

PG

CartPole_PG

python3 coach.py -p CartPole_PG -r
CartPole_PG

DDPG

Pendulum_DDPG

python3 coach.py -p Pendulum_DDPG -r
Pendulum_DDPG

NAF

InvertedPendulum_NAF

python3 coach.py -p InvertedPendulum_NAF -r
InvertedPendulum_NAF

Pendulum_NAF

python3 coach.py -p Pendulum_NAF -r
Pendulum_NAF