mirror of
https://github.com/gryf/coach.git
synced 2026-02-16 22:25:47 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
8.7 KiB
8.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 248.0 | 1.0 | 248.0 | 248.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 123.0 | 1.0 | 123.0 | 371.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 4 | 3 | 0.0 | 1.0 | 88.0 | 1.0 | 88.0 | 459.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 5 | 4 | 0.0 | 1.0 | 187.0 | 1.0 | 187.0 | 646.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 6 | 5 | 0.0 | 1.0 | 86.0 | 1.0 | 86.0 | 732.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 7 | 6 | 0.0 | 1.0 | 331.0 | 1.0 | 331.0 | 1063.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 8 | 7 | 37.0 | 0.0 | 753.0 | 1.0 | 753.0 | 1816.0 | 0.0 | 18.0 | 275.0 | 0.0 | 0.19276376 | 0.24904153 | 0.8257747 | 0.00013455142 | 1.7914727 | 0.00030057304 | 1.7917566000000003 | 1.7904335 | 0.2119369695934143 | 0.4029896256601249 | 1.8961015939712524 | -0.038109242916107185 | 0.059407155999999996 | 0.06025871 | 0.20562454 | -0.0059952493999999995 | 0.10879397 | 0.13543734 | 0.42884704 | 1.0928144400000001e-07 | 0.37946862 | 0.4819447 | 1.3135536 | -0.039071497000000004 | |||||||||||
| 9 | 8 | 43.0 | 0.0 | 107.0 | 1.0 | 107.0 | 1923.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.015239561 | 0.0024996148 | 0.019516254 | 0.011843238999999998 | 1.7906417000000001 | 0.0002779353 | 1.7917081999999998 | 1.790199 | -0.027609226256608964 | 0.013555497071717059 | -0.002493098378181457 | -0.054812923073768616 | 0.23526458 | 0.013103152 | 0.25880286 | 0.2149402 | 0.0010658596000000002 | 4.0590476e-05 | 0.0011093322 | 0.0010099161 | -0.04934317 | 0.011472803 | -0.039747406 | -0.06998761 | |||||||||||
| 10 | 9 | 47.0 | 0.0 | 73.0 | 1.0 | 73.0 | 1996.0 | 0.0 | 1.0 | 25.0 | 0.0 | 0.36924744 | 0.48345366 | 1.0529541999999998 | 0.027382427999999997 | 1.7904778000000001 | 0.00029410556 | 1.7917029999999998 | 1.7902383999999998 | 0.2182698796192805 | 0.4074832854303497 | 0.9812830686569214 | -0.05661928653717041 | 0.29025623 | 0.01401644 | 0.3092764 | 0.27106556 | 0.12317172 | 0.1714658 | 0.36566097 | 0.0019109361999999999 | 0.39213285 | 0.6270856 | 1.2789414 | -0.0569774 | |||||||||||
| 11 | 10 | 60.0 | 0.0 | 251.0 | 1.0 | 251.0 | 2247.0 | 0.0 | 4.0 | 30.0 | 0.0 | 0.43863640000000004 | 0.85959023 | 2.5217259999999997 | 0.035312783 | 1.7856493000000002 | 0.0015894786999999999 | 1.7916044999999998 | 1.7811941999999998 | 0.11047004585464797 | 0.4511292058995869 | 1.8077605962753296 | -0.14487385749816895 | 0.61056226 | 0.07753438 | 0.79911625 | 0.4961661 | 0.1471402 | 0.31318584 | 0.8826119 | 0.0050534373 | 0.1947745 | 0.6780483 | 1.7730703 | -0.1461431 | |||||||||||
| 12 | 11 | 66.0 | 0.0 | 121.0 | 1.0 | 121.0 | 2368.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.14871785 | 0.02547766 | 0.19108213 | 0.12211705 | 1.7780668000000002 | 0.0027863213 | 1.7913043000000002 | 1.7759546000000002 | -0.10607111632823944 | 0.05572927975960492 | -0.009624600410461426 | -0.19753050804138186 | 1.0220673 | 0.034096994 | 1.0672632 | 0.95422196 | 0.021255326 | 0.0012313497 | 0.023416747999999998 | 0.019953651 | -0.18968049 | 0.016377756 | -0.16716708 | -0.21171494 | |||||||||||
| 13 | 12 | 71.0 | 0.0 | 99.0 | 1.0 | 99.0 | 2467.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.15555012 | 0.04610389 | 0.23511228 | 0.12260077 | 1.7795043000000001 | 0.0023333197 | 1.7912293999999997 | 1.7773554 | -0.09951297342777253 | 0.055938959522401716 | -0.003077983856201172 | -0.1911371946334839 | 1.0086769 | 0.036691166000000004 | 1.0868968 | 0.96416116 | 0.02108366 | 0.0023025707 | 0.02351839 | 0.018744798 | -0.17643596 | 0.02231863 | -0.14677188 | -0.20941082 | |||||||||||
| 14 | 13 | 98.0 | 0.0 | 534.0 | 1.0 | 534.0 | 3001.0 | 0.0 | 13.0 | 340.0 | 0.0 | 0.7437658 | 1.1363528999999999 | 3.4128056000000004 | 0.025902914 | 1.7773186 | 0.003927723 | 1.7911369 | 1.7642651000000003 | 0.1266731635882304 | 0.41158095902457936 | 1.7730363607406616 | -0.30260801315307617 | 0.9577929000000001 | 0.23765793 | 1.6504846999999998 | 0.7122954 | 0.15221074 | 0.20607093 | 0.757879 | 0.008616385 | 0.22623166 | 0.6171821 | 1.5555726 | -0.30132666 | |||||||||||
| 15 | 14 | 102.0 | 0.0 | 73.0 | 1.0 | 73.0 | 3074.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.21133159 | 0.027693717000000003 | 0.24578243 | 0.17797336 | 1.7719979 | 0.004124344 | 1.7904103 | 1.7675488999999998 | -0.18784045577049252 | 0.09607193868630864 | -0.01820123195648193 | -0.3348844051361084 | 1.8157914000000002 | 0.008718997 | 1.8383793000000002 | 1.7924569 | 0.056422543 | 0.0016911370000000001 | 0.05787156 | 0.054050256 | -0.33337316 | 0.013935573 | -0.3163426 | -0.35047740000000005 | |||||||||||
| 16 | 15 | 110.0 | 0.0 | 156.0 | 1.0 | 156.0 | 3230.0 | 0.0 | 2.0 | 35.0 | 0.0 | 0.62405765 | 0.9592649999999999 | 2.9572167 | 0.12969549 | 1.7828823 | 0.0024880429999999997 | 1.7908226999999999 | 1.7776691000000002 | -0.06490173169544765 | 0.3036005258432666 | 1.5683243274688718 | -0.3259851932525635 | 1.7211465 | 0.029713133 | 1.7807939999999998 | 1.678612 | 0.12291474 | 0.17745444 | 0.5574187 | 0.04589054 | -0.112290025 | 0.43601707 | 0.9523574 | -0.32423943 | |||||||||||
| 17 | 16 | 122.0 | 0.0 | 238.0 | 1.0 | 238.0 | 3468.0 | 0.0 | 1.0 | 10.0 | 0.0 | 0.15554756 | 0.054277197 | 0.29741687 | 0.08846552 | 1.7868587 | 0.0007967119500000001 | 1.7910342000000001 | 1.7857143 | -0.1573395555669611 | 0.12615047772660626 | 0.7546746730804443 | -0.3632258176803589 | 1.6729587 | 0.17595315 | 1.9944772000000002 | 1.502953 | 0.05232538 | 0.035234198 | 0.15977861 | 0.03124919 | -0.28081256 | 0.079151474 | -0.07227526599999999 | -0.35482806 | |||||||||||
| 18 | 17 | 132.0 | 0.0 | 185.0 | 1.0 | 185.0 | 3653.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.08596161 | 0.024157293 | 0.13088742 | 0.04359092 | 1.7865810000000002 | 0.0008728278999999999 | 1.7914173999999998 | 1.7849907999999999 | -0.1145426501830419 | 0.06364297653654805 | -0.008004307746887207 | -0.26171672344207764 | 1.1001204 | 0.17392394 | 1.4450287 | 0.87743527 | 0.016055183 | 0.005272877 | 0.027343987 | 0.009870462 | -0.20479403 | 0.033908524 | -0.15814927 | -0.26577490000000004 | |||||||||||
| 19 | 18 | 156.0 | 0.0 | 469.0 | 1.0 | 469.0 | 4122.0 | 0.0 | 7.0 | 300.0 | 0.0 | 0.44079409999999997 | 0.6836350999999999 | 2.4870617000000004 | 0.025862668 | 1.7806060000000001 | 0.0046418053 | 1.7915317 | 1.7732808999999998 | 0.062453726452329876 | 0.3429879814155853 | 0.985063374042511 | -0.20165133476257324 | 0.92002445 | 0.10120275599999999 | 1.1328447 | 0.7651458999999999 | 0.091498 | 0.13105219999999998 | 0.42509866 | 0.007523017 | 0.11129719 | 0.47185699999999997 | 1.4228208 | -0.21573834 | |||||||||||
| 20 | 19 | 170.0 | 0.0 | 276.0 | 1.0 | 276.0 | 4398.0 | 0.0 | 3.0 | 20.0 | 0.0 | 0.40178663 | 0.6728098000000001 | 2.4094729999999998 | 0.039225254 | 1.7645447 | 0.0042648454999999995 | 1.790634 | 1.7581539000000002 | -0.00800227591624627 | 0.3189828889920542 | 1.685505986213684 | -0.2216334342956543 | 1.1304478999999998 | 0.038905688 | 1.2024312 | 1.0620688999999999 | 0.0786909 | 0.1614253 | 0.58035713 | 0.012360237 | -0.011329556999999999 | 0.4500582 | 1.2747773999999998 | -0.24353555 | |||||||||||
| 21 | 20 | 191.0 | 0.0 | 420.0 | 1.0 | 420.0 | 4818.0 | 0.0 | 7.0 | 315.0 | 0.0 | 0.36499518 | 0.3705313 | 1.1575621 | 0.04431529 | 1.7636738 | 0.004579304 | 1.7901049 | 1.7550941999999998 | 0.021820819079875944 | 0.3071232093063115 | 0.947787582874298 | -0.2100141644477844 | 0.9154280999999999 | 0.09096202 | 1.1273426000000002 | 0.75912726 | 0.07179129 | 0.09326242 | 0.24904189999999998 | 0.0062419563 | 0.040330485 | 0.34031707 | 0.7956849 | -0.22417025 | |||||||||||
| 22 | 21 | 197.0 | 0.0 | 104.0 | 1.0 | 104.0 | 4922.0 | 0.0 | 1.0 | 30.0 | 0.0 | 0.5144350000000001 | 0.718648 | 1.9511981 | 0.12453204400000001 | 1.761423 | 0.0056095775 | 1.7902006999999998 | 1.7571856 | 0.05070284008979797 | 0.3168600697896934 | 0.925861954689026 | -0.16295456886291504 | 0.88745534 | 0.012690918 | 0.93781537 | 0.8751633 | 0.06495949599999999 | 0.11300202 | 0.2909634 | 0.008293057 | 0.078482285 | 0.45015374 | 0.97864294 | -0.15680452 | |||||||||||
| 23 | 22 | 203.0 | 0.0 | 113.0 | 1.0 | 113.0 | 5035.0 | 0.0 | 2.0 | 15.0 | 0.0 | 0.64118826 | 0.86793905 | 2.3767917 | 0.18172713 | 1.7537029999999998 | 0.006172906 | 1.7896771 | 1.749237 | 0.08747740507125855 | 0.4437113729234238 | 1.7083609104156494 | -0.18463540077209475 | 0.9569305 | 0.024823021 | 1.0054495 | 0.92341274 | 0.12694135 | 0.23306239999999998 | 0.5930651 | 0.009591491 | 0.15546985 | 0.65049607 | 1.4562124 | -0.19140014 | |||||||||||
| 24 | 23 | 212.0 | 0.0 | 162.0 | 1.0 | 162.0 | 5197.0 | 0.0 | 1.0 | 5.0 | 0.0 | 0.16368598 | 0.041053284 | 0.22597034 | 0.10290782 | 1.7606709999999999 | 0.004070151 | 1.7898455000000002 | 1.7559046999999999 | -0.06407594718039036 | 0.13353017492122013 | 0.855490505695343 | -0.17702490091323853 | 0.83308256 | 0.035228863 | 0.9194645 | 0.7807996 | 0.020336542 | 0.032399733 | 0.10603105 | 0.007155789 | -0.11351252 | 0.08716212 | 0.11233470599999999 | -0.18262672 | |||||||||||
| 25 | 24 | 216.0 | 0.0 | 69.0 | 1.0 | 69.0 | 5266.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.18022996 | 0.014585165 | 0.19933702 | 0.16394733 | 1.7644328999999999 | 0.005773647 | 1.7902433999999998 | 1.7603636000000003 | -0.07377323607603709 | 0.04112314147691277 | -0.0068422555923461905 | -0.1422249674797058 | 0.7145154 | 0.026194045 | 0.7514143 | 0.6838747 | 0.006449179300000001 | 0.00046015754999999996 | 0.006775117 | 0.0057984180000000005 | -0.12904127 | 0.008062066 | -0.118956625 | -0.13869014 | |||||||||||
| 26 | 25 | 238.0 | 0.0 | 432.0 | 1.0 | 432.0 | 5698.0 | 0.0 | 6.0 | 85.0 | 0.0 | 0.33006436 | 0.35617912 | 1.3067303999999997 | 0.06890332 | 1.7717101999999998 | 0.004106822 | 1.7902195 | 1.7627416 | 0.07911585221687953 | 0.3249530771268533 | 0.9620689153671264 | -0.13253813982009888 | 0.5261719 | 0.06930914 | 0.7088074000000001 | 0.44418199999999997 | 0.06108107 | 0.09490778 | 0.29277053 | 0.0023036576 | 0.140708 | 0.41082802 | 1.2361767 | -0.12985662 |