1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
Files
coach/benchmarks/td3
Gal Leibovich 7eb884c5b2 TD3 (#338)
2019-06-16 11:11:21 +03:00
..
2019-06-16 11:11:21 +03:00
2019-06-16 11:11:21 +03:00
2019-06-16 11:11:21 +03:00
2019-06-16 11:11:21 +03:00
2019-06-16 11:11:21 +03:00
2019-06-16 11:11:21 +03:00

Twin Delayed DDPG

Each experiment uses 5 seeds and is trained for 1M environment steps. The parameters used for TD3 are the same parameters as described in the original paper, and repository.

Ant TD3 - single worker

coach -p Mujoco_TD3 -lvl ant
Ant TD3

Hopper TD3 - single worker

coach -p Mujoco_TD3 -lvl hopper
Hopper TD3

Half Cheetah TD3 - single worker

coach -p Mujoco_TD3 -lvl half_cheetah
Half Cheetah TD3

Reacher TD3 - single worker

coach -p Mujoco_TD3 -lvl reacher
Reacher TD3

Walker2D TD3 - single worker

coach -p Mujoco_TD3 -lvl walker2d
Walker2D TD3