pre-release 0.10.0

2026-07-06 09:16:31 +02:00 · 2018-08-13 17:11:34 +03:00
parent d44c329bb8
commit 19ca5c24b1
485 changed files with 33292 additions and 16770 deletions
@@ -1,172 +1,44 @@
 # Coach Benchmarks

-The following figures are training curves of some of the presets available through Coach.
-The X axis in all the figures is the total steps (for multi-threaded runs, this is the accumulated number of steps over all the workers).
-The Y axis in all the figures is the average episode reward with an averaging window of 11 episodes.
+The following table represents the current status of algorithms implemented in Coach relative to the results reported in the original papers. The detailed results for each algorithm can be seen by clicking on its name.
+
+The X axis in all the figures is the total steps (for multi-threaded runs, this is the number of steps per worker).
+The Y axis in all the figures is the average episode reward with an averaging window of 100 timesteps.
+
+For each algorithm, there is a command line for reproducing the results of each graph.
 These are the results you can expect to get when running the pre-defined presets in Coach.

+The environments that were used for testing include:
+* **Atari** - Breakout, Pong and Space Invaders
+* **Mujoco** - Inverted Pendulum, Inverted Double Pendulum, Reacher, Hopper, Half Cheetah, Walker 2D, Ant, Swimmer and Humanoid.
+* **Doom** - Basic, Health Gathering (D1: Basic), Health Gathering Supreme (D2: Navigation), Battle (D3: Battle)
+* **Fetch** - Reach, Slide, Push, Pick-and-Place

-## A3C
+## Summary

-### Breakout_A3C with 16 workers
+![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) *Reproducing paper's results*

-```bash
-python3 coach.py -p Breakout_A3C -n 16 -r
-```
+![#ceffad](https://placehold.it/15/ceffad/000000?text=+) *Reproducing paper's results for some of the environments*

-<img src="img/Breakout_A3C_16_workers.png" alt="Breakout_A3C_16_workers" width="400"/>
+![#FFA500](https://placehold.it/15/FFA500/000000?text=+) *Training but not reproducing paper's results*

-### InvertedPendulum_A3C with 16 workers
+![#FF4040](https://placehold.it/15/FF4040/000000?text=+) *Not training*

-```bash
-python3 coach.py -p InvertedPendulum_A3C -n 16 -r
-```

-<img src="img/Inverted_Pendulum_A3C_16_workers.png" alt="Inverted_Pendulum_A3C_16_workers" width="400"/>
+|                         |**Status**                                                |**Environments**|**Comments**|
+| ----------------------- |:--------------------------------------------------------:|:--------------:|:--------:|
+|**[DQN](dqn)**                  | ![#ceffad](https://placehold.it/15/ceffad/000000?text=+) |Atari           | Pong is not training |
+|**[Dueling DDQN](dueling_ddqn)**| ![#ceffad](https://placehold.it/15/ceffad/000000?text=+) |Atari           | Pong is not training |
+|**[Dueling DDQN with PER](dueling_ddqn_with_per)**| ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
+|**[Bootstrapped DQN](bootstrapped_dqn)**| ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
+|**[QR-DQN](qr_dqn)**            | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
+|**[A3C](a3c)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari, Mujoco   | |
+|**[Clipped PPO](clipped_ppo)**  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco          | |
+|**[DDPG](ddpg)**                | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco          | |
+|**[NEC](nec)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
+|**[HER](ddpg_her)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Fetch           | |
+|**[HAC](hac)**                  | ![#969696](https://placehold.it/15/969696/000000?text=+) |Pendulum        | |
+|**[DFP](dfp)**                  | ![#ceffad](https://placehold.it/15/ceffad/000000?text=+) |Doom            | Doom Battle was not verified |

-### Hopper_A3C with 16 workers

-```bash
-python3 coach.py -p Hopper_A3C -n 16 -r
-```
-
-<img src="img/Hopper_A3C_16_workers.png" alt="Hopper_A3C_16_workers" width="400"/>
-
-### Ant_A3C with 16 workers
-
-```bash
-python3 coach.py -p Ant_A3C -n 16 -r
-```
-
-<img src="img/Ant_A3C_16_workers.png" alt="Ant_A3C_16_workers" width="400"/>
-
-## Clipped PPO
-
-### InvertedPendulum_ClippedPPO with 16 workers
-
-```bash
-python3 coach.py -p InvertedPendulum_ClippedPPO -n 16 -r
-```
-
-<img src="img/InvertedPendulum_ClippedPPO_16_workers.png" alt="InvertedPendulum_ClippedPPO_16_workers" width="400"/>
-
-### Hopper_ClippedPPO with 16 workers
-
-```bash
-python3 coach.py -p Hopper_ClippedPPO -n 16 -r
-```
-
-<img src="img/Hopper_ClippedPPO_16_workers.png" alt="Hopper_Clipped_PPO_16_workers" width="400"/>
-
-### Humanoid_ClippedPPO with 16 workers
-
-```bash
-python3 coach.py -p Humanoid_ClippedPPO -n 16 -r
-```
-
-<img src="img/Humanoid_ClippedPPO_16_workers.png" alt="Humanoid_ClippedPPO_16_workers" width="400"/>
-
-## DQN
-
-### Pong_DQN
-
-```bash
-python3 coach.py -p Pong_DQN -r
-```
-
-<img src="img/Pong_DQN.png" alt="Pong_DQN" width="400"/>
-
-### Doom_Basic_DQN
-
-```bash
-python3 coach.py -p Doom_Basic_DQN -r
-```
-
-<img src="img/Doom_Basic_DQN.png" alt="Doom_Basic_DQN" width="400"/>
-
-## Dueling DDQN
-
-### Doom_Basic_Dueling_DDQN
-
-```bash
-python3 coach.py -p Doom_Basic_Dueling_DDQN -r
-```
-
-<img src="img/Doom_Basic_Dueling_DDQN.png" alt="Doom_Basic_Dueling_DDQN" width="400"/>
-
-## DFP
-
-### Doom_Health_DFP
-
-```bash
-python3 coach.py -p Doom_Health_DFP -r
-```
-
-<img src="img/Doom_Health_DFP.png" alt="Doom_Health_DFP" width="400"/>
-
-## MMC
-
-### Doom_Health_MMC
-
-```bash
-python3 coach.py -p Doom_Health_MMC -r
-```
-
-<img src="img/Doom_Health_MMC.png" alt="Doom_Health_MMC" width="400"/>
-
-## NEC
-
-## Pong_NEC
-
-```bash
-python3 coach.py -p Pong_NEC -r
-```
-
-<img src="img/Pong_NEC.png" alt="Pong_NEC" width="400"/>
-
-## Doom_Basic_NEC
-
-```bash
-python3 coach.py -p Doom_Basic_NEC -r
-```
-
-<img src="img/Doom_Basic_NEC.png" alt="Doom_Basic_NEC" width="400"/>
-
-## PG
-
-### CartPole_PG
-
-```bash
-python3 coach.py -p CartPole_PG -r
-```
-
-<img src="img/CartPole_PG.png" alt="CartPole_PG" width="400"/>
-
-## DDPG
-
-### Pendulum_DDPG
-
-```bash
-python3 coach.py -p Pendulum_DDPG -r
-```
-
-<img src="img/Pendulum_DDPG.png" alt="Pendulum_DDPG" width="400"/>
-
-
-## NAF
-
-### InvertedPendulum_NAF
-
-```bash
-python3 coach.py -p InvertedPendulum_NAF -r
-```
-
-<img src="img/InvertedPendulum_NAF.png" alt="InvertedPendulum_NAF" width="400"/>
-
-### Pendulum_NAF
-
-```bash
-python3 coach.py -p Pendulum_NAF -r
-```
-
-<img src="img/Pendulum_NAF.png" alt="Pendulum_NAF" width="400"/>
+**Click on each algorithm to see detailed benchmarking results**
@@ -0,0 +1,43 @@
+# A3C
+
+Each experiment uses 3 seeds.
+The parameters used for Clipped PPO are the same parameters as described in the [original paper](https://arxiv.org/abs/1707.06347).
+
+### Inverted Pendulum A3C - 1/2/4/8/16 workers
+
+```bash
+python3 coach.py -p Mujoco_A3C -lvl inverted_pendulum -n 1
+python3 coach.py -p Mujoco_A3C -lvl inverted_pendulum -n 2
+python3 coach.py -p Mujoco_A3C -lvl inverted_pendulum -n 4
+python3 coach.py -p Mujoco_A3C -lvl inverted_pendulum -n 8
+python3 coach.py -p Mujoco_A3C -lvl inverted_pendulum -n 16
+```
+
+<img src="inverted_pendulum_a3c.png" alt="Inverted Pendulum A3C" width="800"/>
+
+
+### Hopper A3C - 16 workers
+
+```bash
+python3 coach.py -p Mujoco_A3C -lvl hopper -n 16
+```
+
+<img src="hopper_a3c_16_workers.png" alt="Hopper A3C 16 workers" width="800"/>
+
+
+### Walker2D A3C - 16 workers
+
+```bash
+python3 coach.py -p Mujoco_A3C -lvl walker2d -n 16
+```
+
+<img src="walker2d_a3c_16_workers.png" alt="Walker2D A3C 16 workers" width="800"/>
+
+
+### Space Invaders A3C - 16 workers
+
+```bash
+python3 coach.py -p Atari_A3C -lvl space_invaders -n 16
+```
+
+<img src="space_invaders_a3c_16_workers.png" alt="Space Invaders A3C 16 workers" width="800"/>
@@ -0,0 +1,31 @@
+# Bootstrapped DQN
+
+Each experiment uses 3 seeds.
+The parameters used for Bootstrapped DQN are the same parameters as described in the [original paper](https://arxiv.org/abs/1602.04621.pdf).
+
+### Breakout Bootstrapped DQN - single worker
+
+```bash
+python3 coach.py -p Atari_Bootstrapped_DQN -lvl breakout
+```
+
+<img src="breakout_bootstrapped_dqn.png" alt="Breakout Bootstrapped DQN" width="800"/>
+
+
+### Pong Bootstrapped DQN - single worker
+
+```bash
+python3 coach.py -p Atari_Bootstrapped_DQN -lvl pong
+```
+
+<img src="pong_bootstrapped_dqn.png" alt="Pong Bootstrapped DQN" width="800"/>
+
+
+### Space Invaders Bootstrapped DQN - single worker
+
+```bash
+python3 coach.py -p Atari_Bootstrapped_DQN -lvl space_invaders
+```
+
+<img src="space_invaders_bootstrapped_dqn.png" alt="Space Invaders Bootstrapped DQN" width="800"/>
+
@@ -0,0 +1,84 @@
+# Clipped PPO
+
+Each experiment uses 3 seeds and is trained for 10k environment steps.
+The parameters used for Clipped PPO are the same parameters as described in the [original paper](https://arxiv.org/abs/1707.06347).
+
+### Inverted Pendulum Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl inverted_pendulum
+```
+
+<img src="inverted_pendulum_clipped_ppo.png" alt="Inverted Pendulum Clipped PPO" width="800"/>
+
+
+### Inverted Double Pendulum Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl inverted_double_pendulum
+```
+
+<img src="inverted_double_pendulum_clipped_ppo.png" alt="Inverted Double Pendulum Clipped PPO" width="800"/>
+
+
+### Reacher Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl reacher
+```
+
+<img src="reacher_clipped_ppo.png" alt="Reacher Clipped PPO" width="800"/>
+
+
+### Hopper Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl hopper
+```
+
+<img src="hopper_clipped_ppo.png" alt="Hopper Clipped PPO" width="800"/>
+
+
+### Half Cheetah Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl half_cheetah
+```
+
+<img src="half_cheetah_clipped_ppo.png" alt="Half Cheetah Clipped PPO" width="800"/>
+
+
+### Walker 2D Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl walker2d
+```
+
+<img src="walker2d_clipped_ppo.png" alt="Walker 2D Clipped PPO" width="800"/>
+
+
+### Ant Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl ant
+```
+
+<img src="ant_clipped_ppo.png" alt="Ant Clipped PPO" width="800"/>
+
+
+### Swimmer Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl swimmer
+```
+
+<img src="swimmer_clipped_ppo.png" alt="Swimmer Clipped PPO" width="800"/>
+
+
+### Humanoid Clipped PPO - single worker
+
+```bash
+python3 coach.py -p Mujoco_ClippedPPO -lvl humanoid
+```
+
+<img src="humanoid_clipped_ppo.png" alt="Humanoid Clipped PPO" width="800"/>
@@ -0,0 +1,84 @@
+# DDPG
+
+Each experiment uses 3 seeds and is trained for 2k environment steps.
+The parameters used for DDPG are the same parameters as described in the [original paper](https://arxiv.org/abs/1509.02971).
+
+### Inverted Pendulum DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl inverted_pendulum
+```
+
+<img src="inverted_pendulum_ddpg.png" alt="Inverted Pendulum DDPG" width="800"/>
+
+
+### Inverted Double Pendulum DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl inverted_double_pendulum
+```
+
+<img src="inverted_double_pendulum_ddpg.png" alt="Inverted Double Pendulum DDPG" width="800"/>
+
+
+### Reacher DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl reacher
+```
+
+<img src="reacher_ddpg.png" alt="Reacher DDPG" width="800"/>
+
+
+### Hopper DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl hopper
+```
+
+<img src="hopper_ddpg.png" alt="Hopper DDPG" width="800"/>
+
+
+### Half Cheetah DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl half_cheetah
+```
+
+<img src="half_cheetah_ddpg.png" alt="Half Cheetah DDPG" width="800"/>
+
+
+### Walker 2D DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl walker2d
+```
+
+<img src="walker2d_ddpg.png" alt="Walker 2D DDPG" width="800"/>
+
+
+### Ant DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl ant
+```
+
+<img src="ant_ddpg.png" alt="Ant DDPG" width="800"/>
+
+
+### Swimmer DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl swimmer
+```
+
+<img src="swimmer_ddpg.png" alt="Swimmer DDPG" width="800"/>
+
+
+### Humanoid DDPG - single worker
+
+```bash
+python3 coach.py -p Mujoco_DDPG -lvl humanoid
+```
+
+<img src="humanoid_ddpg.png" alt="Humanoid DDPG" width="800"/>
@@ -0,0 +1,40 @@
+# DDPG with Hindsight Experience Replay
+
+Each experiment uses 3 seeds.
+The parameters used for DDPG HER are the same parameters as described in the [following paper](https://arxiv.org/abs/1802.09464).
+
+### Fetch Reach DDPG HER - single worker
+
+```bash
+python3 coach.py -p Fetch_DDPG_HER_baselines -lvl reach
+```
+
+<img src="fetch_ddpg_her_reach_1_worker.png" alt="Fetch DDPG HER Reach 1 Worker" width="800"/>
+
+
+### Fetch Push DDPG HER - 8 workers
+
+```bash
+python3 coach.py -p Fetch_DDPG_HER_baselines -lvl push -n 8
+```
+
+<img src="fetch_ddpg_her_push_8_workers.png" alt="Fetch DDPG HER Push 8 Worker" width="800"/>
+
+
+### Fetch Slide DDPG HER - 8 workers
+
+```bash
+python3 coach.py -p Fetch_DDPG_HER_baselines -lvl slide -n 8
+```
+
+<img src="fetch_ddpg_her_slide_8_workers.png" alt="Fetch DDPG HER Slide 8 Worker" width="800"/>
+
+
+### Fetch Pick And Place DDPG HER - 8 workers
+
+```bash
+python3 coach.py -p Fetch_DDPG_HER -lvl pick_and_place -n 8
+```
+
+<img src="fetch_ddpg_her_pick_and_place_8_workers.png" alt="Fetch DDPG HER Pick And Place 8 Workers" width="800"/>
+
@@ -0,0 +1,31 @@
+# DFP
+
+Each experiment uses 3 seeds.
+The parameters used for DFP are the same parameters as described in the [original paper](https://arxiv.org/abs/1611.01779).
+
+### Doom Basic DFP - 8 workers
+
+```bash
+python3 coach.py -p Doom_Basic_DFP -n 8
+```
+
+<img src="doom_basic_dfp_8_workers.png" alt="Doom Basic DFP 8 workers" width="800"/>
+
+
+### Doom Health (D1: Basic) DFP - 8 workers
+
+```bash
+python3 coach.py -p Doom_Health_DFP -n 8
+```
+
+<img src="doom_health_dfp_8_workers.png" alt="Doom Health DFP 8 workers" width="800"/>
+
+
+
+### Doom Health Supreme (D2: Navigation) DFP - 8 workers
+
+```bash
+python3 coach.py -p Doom_Health_Supreme_DFP -n 8
+```
+
+<img src="doom_health_supreme_dfp_8_workers.png" alt="Doom Health Supreme DFP 8 workers" width="800"/>
@@ -0,0 +1,14 @@
+# DQN
+
+Each experiment uses 3 seeds.
+The parameters used for DQN are the same parameters as described in the [original paper](https://arxiv.org/abs/1607.05077.pdf).
+
+### Breakout DQN - single worker
+
+```bash
+python3 coach.py -p Atari_DQN -lvl breakout
+```
+
+<img src="breakout_dqn.png" alt="Breakout DQN" width="800"/>
+
+
@@ -0,0 +1,14 @@
+# Dueling DDQN
+
+Each experiment uses 3 seeds and is trained for 10k environment steps.
+The parameters used for Dueling DDQN are the same parameters as described in the [original paper](https://arxiv.org/abs/1706.01502).
+
+### Breakout Dueling DDQN - single worker
+
+```bash
+python3 coach.py -p Atari_Dueling_DDQN -lvl breakout
+```
+
+<img src="breakout_dueling_ddqn.png" alt="Breakout Dueling DDQN" width="800"/>
+
+
@@ -0,0 +1,31 @@
+# Dueling DDQN with Prioritized Experience Replay
+
+Each experiment uses 3 seeds and is trained for 10k environment steps.
+The parameters used for Dueling DDQN with PER are the same parameters as described in the [following paper](https://arxiv.org/abs/1511.05952).
+
+### Breakout Dueling DDQN with PER - single worker
+
+```bash
+python3 coach.py -p Atari_Dueling_DDQN_with_PER_OpenAI -lvl breakout
+```
+
+<img src="breakout_dueling_ddqn_with_per.png" alt="Breakout Dueling DDQN with PER" width="800"/>
+
+
+### Pong Dueling DDQN with PER - single worker
+
+```bash
+python3 coach.py -p Atari_Dueling_DDQN_with_PER_OpenAI -lvl pong
+```
+
+<img src="pong_dueling_ddqn_with_per.png" alt="Pong Dueling DDQN with PER" width="800"/>
+
+
+### Space Invaders Dueling DDQN with PER - single worker
+
+```bash
+python3 coach.py -p Atari_Dueling_DDQN_with_PER_OpenAI -lvl space_invaders
+```
+
+<img src="space_invaders_dueling_ddqn_with_per.png" alt="Space Invaders Dueling DDQN with PER" width="800"/>
+
@@ -0,0 +1,21 @@
+# Quantile Regression DQN
+
+Each experiment uses 3 seeds and is trained for 10k environment steps.
+The parameters used for QR-DQN are the same parameters as described in the [original paper](https://arxiv.org/abs/1710.10044.pdf).
+
+### Breakout QR-DQN - single worker
+
+```bash
+python3 coach.py -p Atari_QR_DQN -lvl breakout
+```
+
+<img src="breakout_qr_dqn.png" alt="Breakout QR-DQN" width="800"/>
+
+
+### Pong QR-DQN - single worker
+
+```bash
+python3 coach.py -p Atari_QR_DQN -lvl pong
+```
+
+<img src="pong_qr_dqn.png" alt="Pong QR-DQN" width="800"/>