mirror of
https://github.com/gryf/coach.git
synced 2026-01-28 02:55:46 +01:00
1.4 KiB
1.4 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | Q Values/Mean | Q Values/Stdev | Q Values/Max | Q Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1.0 | 1117.0 | 1117.0 | 0.5 | 0.0 | |||||||||||||||||||||||||||||||||
| 3 | 2 | 164.0 | 0.0 | 819.0 | 1.0 | 819.0 | 1936.0 | 0.4919737999999965 | -21.0 | -21.0 | 0.0 | 0.08973327 | 0.10308977 | 0.35256127 | -0.28368974 | 0.09622141 | 0.28172967 | 2.938948 | 3.398415e-05 | |||||||||||||||||||||||
| 4 | 3 | 348.0 | 0.0 | 920.0 | 1.0 | 920.0 | 2856.0 | 0.4829577999999926 | -21.0 | -21.0 | 0.0 | 0.085431345 | 0.038046483 | 0.26499447 | -0.00268347 | 0.102118276 | 0.23871702 | 1.2644383999999997 | 2.8306908e-05 | |||||||||||||||||||||||
| 5 | 4 | 517.0 | 0.0 | 843.0 | 1.0 | 843.0 | 3699.0 | 0.474696399999989 | -21.0 | -21.0 | 0.0 | 0.12869294 | 0.02720576 | 0.2113828 | 0.057547163 | 0.07727617 | 0.19602461 | 1.3975663 | 0.0008099895 | |||||||||||||||||||||||
| 6 | 5 | 700.0 | 0.0 | 913.0 | 1.0 | 913.0 | 4612.0 | 0.4657489999999852 | -20.0 | -20.0 | 0.0 | 0.16959693 | 0.053117845 | 0.3296745 | 0.055682946 | 0.04773866400000001 | 0.14551932 | 1.2484678999999999 | 7.813723e-06 |