1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-17 21:03:32 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,210.0,0.0,1958.0,1958.0,841.0,1958.0,0.999167410000018,-20.0,-20.0,0.0,,,,0.011756908099604993,0.01245646310720048,0.05387234315276146,0.00010689756891224532,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.057962038,0.04616896400000001,0.26208854,0.0071766186,,,,
3,402.0,0.0,2726.0,2726.0,768.0,2726.0,0.9984070900000346,-21.0,-21.0,0.0,,,,0.012809355009267165,0.013771132011321113,0.07975033670663834,5.99101695115678e-05,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.052051324,0.028359309,0.17658195,0.008862591,-0.017426128,0.0060299635,-0.008042792,-0.026319288
4,601.0,0.0,3519.0,3519.0,793.0,3519.0,0.9976220200000516,-21.0,-21.0,0.0,,,,0.015272312543569037,0.013672084153799915,0.05628284066915512,0.00023415754549205303,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.052314125,0.023336997,0.1473458,0.012913031,-0.031559315,0.0042713494,-0.023393027,-0.036418874
5,809.0,0.0,4352.0,4352.0,833.0,4352.0,0.9967973500000696,-21.0,-21.0,0.0,,,,0.013082799735107424,0.01255374334846098,0.06567259877920151,0.0004701522993855178,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.043265857000000005,0.014534917,0.089655906,0.017195849,-0.0053307074,0.0027605025,-0.0019208845999999999,-0.00974094
2,205.0,0.0,1937.0,1937.0,820.0,1937.0,0.9991882000000176,-21.0,-21.0,0.0,,,,0.013271789207150194,0.014381215654183937,0.08661144971847534,7.284892490133643e-05,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.09793413,0.109029554,1.2459028,0.010081228000000001,,,,
3,413.0,0.0,2768.0,2768.0,831.0,2768.0,0.9983655100000356,-21.0,-21.0,0.0,,,,0.013095782662258687,0.014563835652836424,0.09017306566238403,4.85398450109642e-05,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.06699568,0.10204898,0.9738844000000001,0.005621953000000001,-0.06337769,0.006071376999999999,-0.05691424,-0.07540042
4,667.0,0.0,3783.0,3783.0,1015.0,3783.0,0.9973606600000572,-20.0,-20.0,0.0,,,,0.014243900448040163,0.012460161619208224,0.05600857362151146,8.375291145057417e-06,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.08014218,0.05026457,0.24418142,0.0018464670999999999,-0.08484802400000001,0.007937772,-0.07532068,-0.09821871
5,867.0,0.0,4585.0,4585.0,802.0,4585.0,0.9965666800000744,-21.0,-21.0,0.0,,,,0.0149451127843804,0.012661744241431476,0.057885006070137024,2.08603323699208e-05,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.084665276,0.07432766,0.39534,0.0034519034000000002,-0.09767585,0.029707237999999997,-0.061746947,-0.13731477
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 210.0 205.0 0.0 1958.0 1937.0 1958.0 1937.0 841.0 820.0 1958.0 1937.0 0.999167410000018 0.9991882000000176 -20.0 -21.0 -20.0 -21.0 0.0 0.011756908099604993 0.013271789207150194 0.01245646310720048 0.014381215654183937 0.05387234315276146 0.08661144971847534 0.00010689756891224532 7.284892490133643e-05 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.057962038 0.09793413 0.04616896400000001 0.109029554 0.26208854 1.2459028 0.0071766186 0.010081228000000001
4 3 402.0 413.0 0.0 2726.0 2768.0 2726.0 2768.0 768.0 831.0 2726.0 2768.0 0.9984070900000346 0.9983655100000356 -21.0 -21.0 0.0 0.012809355009267165 0.013095782662258687 0.013771132011321113 0.014563835652836424 0.07975033670663834 0.09017306566238403 5.99101695115678e-05 4.85398450109642e-05 0.0002500000000000001 5.421010862427521e-20 1.0842021724855042e-19 0.00025 0.00025 0.052051324 0.06699568 0.028359309 0.10204898 0.17658195 0.9738844000000001 0.008862591 0.005621953000000001 -0.017426128 -0.06337769 0.0060299635 0.006071376999999999 -0.008042792 -0.05691424 -0.026319288 -0.07540042
5 4 601.0 667.0 0.0 3519.0 3783.0 3519.0 3783.0 793.0 1015.0 3519.0 3783.0 0.9976220200000516 0.9973606600000572 -21.0 -20.0 -21.0 -20.0 0.0 0.015272312543569037 0.014243900448040163 0.013672084153799915 0.012460161619208224 0.05628284066915512 0.05600857362151146 0.00023415754549205303 8.375291145057417e-06 0.0002500000000000001 5.421010862427521e-20 0.00025 0.00025 0.052314125 0.08014218 0.023336997 0.05026457 0.1473458 0.24418142 0.012913031 0.0018464670999999999 -0.031559315 -0.08484802400000001 0.0042713494 0.007937772 -0.023393027 -0.07532068 -0.036418874 -0.09821871
6 5 809.0 867.0 0.0 4352.0 4585.0 4352.0 4585.0 833.0 802.0 4352.0 4585.0 0.9967973500000696 0.9965666800000744 -21.0 -21.0 0.0 0.013082799735107424 0.0149451127843804 0.01255374334846098 0.012661744241431476 0.06567259877920151 0.057885006070137024 0.0004701522993855178 2.08603323699208e-05 0.0002500000000000001 1.0842021724855042e-19 5.421010862427521e-20 0.00025 0.00025 0.043265857000000005 0.084665276 0.014534917 0.07432766 0.089655906 0.39534 0.017195849 0.0034519034000000002 -0.0053307074 -0.09767585 0.0027605025 0.029707237999999997 -0.0019208845999999999 -0.061746947 -0.00974094 -0.13731477