mirror of
https://github.com/gryf/coach.git
synced 2026-01-07 22:34:23 +01:00
2.6 KiB
2.6 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1001.0 | 1.0 | 1001.0 | 1001.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 2002.0 | 2.0 | 1001.0 | 2002.0 | 0.0 | 1.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 1000.0 | 0.0 | 3003.0 | 3.0 | 1001.0 | 3003.0 | -0.1185302492771778 | 8.62704551591294 | 86.2704551591296 | 1.0 | 1.0509011072599606e-05 | 4.393642656353033e-05 | 0.0008535402594134213 | 1.1514939615153708e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.004000389 | 0.00447183 | 0.062234186 | 0.00047969296999999996 | 0.08464705 | 0.16014087 | 0.45386302 | -0.26037258 | 0.01247160570665026 | 0.02153857694844653 | 0.08672064238048882 | -0.04962609781241383 | 0.3359349988514577 | 0.6368093944604776 | 1.3638484370927098 | -1.3839266445045957 | |||||||||||||||||||||||
| 5 | 4 | 2001.0 | 0.0 | 4004.0 | 4.0 | 1001.0 | 4004.0 | -0.2048510260598676 | 17.580070175231974 | 175.80070175231998 | 1.0 | 0.0005509343815205071 | 0.0018491137578482792 | 0.023759014904499054 | 5.607626462733606e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.045537997000000004 | 0.09140324 | 1.2210321000000002 | 0.0010273910000000001 | 0.1922657 | 0.16243528 | 0.44480476 | -0.2532415 | 0.03993582413609073 | 0.11728732960908478 | 0.5736919507147175 | -0.26410636501093465 | 0.6924021347523865 | 0.5892731229023225 | 1.3749280698542792 | -1.507436630113174 | |||||||||||||||||||||||
| 6 | 5 | 3002.0 | 0.0 | 5005.0 | 5.0 | 1001.0 | 5005.0 | -0.02134772535498328 | 13.124325999088368 | 131.24325999088364 | 0.0 | 0.0001703916229396802 | 0.000568676102611858 | 0.004801726434379816 | 2.6488642106414773e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.014244637 | 0.014174069 | 0.10748595 | 0.00069606147 | 0.38734838 | 0.23498419 | 0.6344281 | -0.10678842 | 0.09845966999879296 | 0.17017726714756395 | 0.6471482681083021 | -0.23208531499469515 | 0.8583268163158988 | 0.5493396564055796 | 1.4005169604031336 | -1.084873489999208 |