mirror of
https://github.com/gryf/coach.git
synced 2026-01-09 15:24:13 +01:00
2.6 KiB
2.6 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Discounted Return/Mean | Discounted Return/Stdev | Discounted Return/Max | Discounted Return/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1000.0 | 1.0 | 1000.0 | 1000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 2000.0 | 2.0 | 1000.0 | 2000.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 999.0 | 0.0 | 3000.0 | 3.0 | 1000.0 | 3000.0 | -0.017666830179174003 | 0.0 | 0.0 | 1.0 | 0.0035039535888519192 | 0.0036899070860429047 | 0.0448087714612484 | 0.0006857202388346195 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.15666896 | 0.14784017 | 1.7027674 | 0.027932363999999998 | 0.0 | 0.0 | 0.0 | 0.0 | 0.28616783 | 0.1527039 | 0.82195866 | -0.052169282000000004 | 0.12254468997797058 | 0.14238915773279914 | 0.9096015518903732 | -0.459991791844368 | 0.07060378249719693 | 0.9005342625617344 | 1.6849354556578502 | -1.5135477528043664 | |||||||||||||||||||||||
| 5 | 4 | 1999.0 | 0.0 | 4000.0 | 4.0 | 1000.0 | 4000.0 | -0.039999362478752916 | 0.0 | 0.0 | 1.0 | 0.0023975625592941143 | 0.0020512967193832047 | 0.03644856810569763 | 0.0006296735955402255 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.10159315 | 0.08087424 | 1.1576957 | 0.023222 | 0.0 | 0.0 | 0.0 | 0.0 | 0.24290629 | 0.11069059 | 0.7463965 | -0.0065027475 | 0.2942720267621687 | 0.14379040300676074 | 1.032085758447647 | -0.2381104633212089 | 0.4210885653083537 | 0.7995310847152353 | 1.7607054373862634 | -1.3770179740736286 | |||||||||||||||||||||||
| 6 | 5 | 2999.0 | 0.0 | 5000.0 | 5.0 | 1000.0 | 5000.0 | 0.17145601483403705 | 0.0 | 0.0 | 0.0 | 0.0013986770755837895 | 0.0011338123535962153 | 0.018688598647713658 | 0.00039429025491699576 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.057997093 | 0.035725467000000004 | 0.5853672 | 0.015558452 | 0.0 | 0.0 | 0.0 | 0.0 | 0.19842595 | 0.07881614 | 0.6942527 | -0.16243972 | 0.367736402077882 | 0.1450020289455392 | 1.0550748002529144 | -0.04278800502419472 | -0.009665988609877104 | 0.8992571498573005 | 1.8188017021946057 | -1.4079581912324373 |