1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00
Files
coach/benchmarks/td3/README.md
Gal Leibovich 7eb884c5b2 TD3 (#338)
2019-06-16 11:11:21 +03:00

49 lines
952 B
Markdown

# Twin Delayed DDPG
Each experiment uses 5 seeds and is trained for 1M environment steps.
The parameters used for TD3 are the same parameters as described in the [original paper](https://arxiv.org/pdf/1802.09477.pdf), and [repository](https://github.com/sfujim/TD3).
### Ant TD3 - single worker
```bash
coach -p Mujoco_TD3 -lvl ant
```
<img src="ant.png" alt="Ant TD3" width="800"/>
### Hopper TD3 - single worker
```bash
coach -p Mujoco_TD3 -lvl hopper
```
<img src="hopper.png" alt="Hopper TD3" width="800"/>
### Half Cheetah TD3 - single worker
```bash
coach -p Mujoco_TD3 -lvl half_cheetah
```
<img src="half_cheetah.png" alt="Half Cheetah TD3" width="800"/>
### Reacher TD3 - single worker
```bash
coach -p Mujoco_TD3 -lvl reacher
```
<img src="reacher.png" alt="Reacher TD3" width="800"/>
### Walker2D TD3 - single worker
```bash
coach -p Mujoco_TD3 -lvl walker2d
```
<img src="walker2d.png" alt="Walker2D TD3" width="800"/>