mirror of
https://github.com/gryf/coach.git
synced 2025-12-19 04:00:18 +01:00
2.7 KiB
2.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Discounted Return/Mean | Discounted Return/Stdev | Discounted Return/Max | Discounted Return/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1000.0 | 1.0 | 1000.0 | 1000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 2000.0 | 2.0 | 1000.0 | 2000.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 999.0 | 0.0 | 3000.0 | 3.0 | 1000.0 | 3000.0 | -0.017666830179174003 | 0.0 | 0.0 | 1.0 | 0.0029464550291862087 | 0.0025701377750570642 | 0.02788718044757843 | 0.0006394493393599987 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.13357769 | 0.117093444 | 1.2991068000000001 | 0.026759505 | 0.0 | 0.0 | 0.0 | 0.0 | 0.15686034 | 0.0627305 | 0.39373034 | -0.3585922 | -0.0033900204140713077 | 0.15875771875068714 | 0.5218342781066895 | -0.5829120719432831 | 0.018559266145446292 | 0.8379639652873171 | 1.3133153498255412 | -1.2431993702510542 | |||||||||||||||||||||||
| 5 | 4 | 1999.0 | 0.0 | 4000.0 | 4.0 | 1000.0 | 4000.0 | -0.039999362478752916 | 0.0021780076323496323 | 0.02178007632349632 | 1.0 | 0.0006978411426705012 | 0.00034956689330895783 | 0.002974547212943435 | 0.0001349833473796025 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.041489832000000004 | 0.02103676 | 0.1902087 | 0.010260164 | 2.3785131604782346e-05 | 0.00021786301549431403 | 0.002166953447005496 | 0.0 | 0.10348537 | 0.037828527 | 0.58358335 | -0.20245944 | 0.1538198277351862 | 0.12114966051922052 | 0.6158200460672378 | -0.3744521605968476 | 0.08143989407802325 | 0.8094344175263435 | 1.2337204414679308 | -1.3201327969582874 | |||||||||||||||||||||||
| 6 | 5 | 2999.0 | 0.0 | 5000.0 | 5.0 | 1000.0 | 5000.0 | 0.17145601483403705 | 0.0 | 0.0 | 0.0 | 0.0004858621195543909 | 0.0005712790351431513 | 0.00709826499223709 | 9.616250463295728e-05 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.028758908 | 0.02100075 | 0.23695771 | 0.007761135 | 0.0 | 0.0 | 0.0 | 0.0 | -0.082560025 | 0.042022076 | 0.32945228 | -0.19682288 | 0.20584864377714734 | 0.1282641644604344 | 0.6959143048524856 | -0.15129065528511998 | -0.22636502595053595 | 0.7716659678121603 | 1.6171153782369072 | -1.244061515013705 |