mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 11:10:20 +01:00
Twin Delayed DDPG
Each experiment uses 5 seeds and is trained for 1M environment steps. The parameters used for TD3 are the same parameters as described in the original paper, and repository.
Ant TD3 - single worker
coach -p Mujoco_TD3 -lvl ant
Hopper TD3 - single worker
coach -p Mujoco_TD3 -lvl hopper
Half Cheetah TD3 - single worker
coach -p Mujoco_TD3 -lvl half_cheetah
Reacher TD3 - single worker
coach -p Mujoco_TD3 -lvl reacher
Walker2D TD3 - single worker
coach -p Mujoco_TD3 -lvl walker2d