1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-10 07:03:40 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min
1,0.0,1.0,881.0,1.0,881.0,881.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,1043.0,1.0,1043.0,1924.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,40.0,0.0,800.0,1.0,800.0,2724.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.9787095000000001,1.6848618999999998,7.008404,0.010777535,1.6982583999999998,0.09253728,1.7866282,1.4752818,-1.1255737884236818,1.011838146554653,0.5801041126251221,-3.300693988800049,-0.599011,0.9902591000000001,0.05652909,-3.2763072999999996,0.15230414,0.30587232,1.9219400000000002,2.232377e-05,-0.35678893,0.77703935,0.5012678,-3.3208616
4,84.0,0.0,874.0,1.0,874.0,3598.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,2.7599952,4.4335379999999995,30.457005,0.31769055,1.6350366000000003,0.04428426,1.6724039000000002,1.4955306000000002,0.012454061887480996,0.5316342837334819,2.479757070541382,-0.97082257270813,-1.8976341,0.45726654,-1.5126665000000001,-3.3147082,0.14139508,0.42458066,2.8789976000000004,0.01374856,0.022890297999999996,0.7258614999999999,3.5482929,-1.1909457
5,133.0,0.0,962.0,1.0,962.0,4560.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,2.2067685,1.6116456000000001,8.973311,0.21847898,1.671777,0.014726803,1.6910971000000001,1.6381558,-0.005908215376385918,0.3910391646463375,1.0921021699905396,-0.9658074378967284,-1.8732144,0.11304277,-1.7097299,-2.2033259999999997,0.07926889,0.09812635,0.5330912,0.017862901,0.0026857766999999998,0.55807036,1.739695,-1.2937489
3,38.0,0.0,763.0,1.0,763.0,2687.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.4867063,1.4938432,5.887912,0.0015561687,1.7584827,0.03291289,1.788348,1.6738943,-0.15541847978173265,0.4162753016651965,0.5290131568908691,-1.0030009746551514,-1.0561148,0.93491656,0.021942224,-2.8995342,0.09872001400000001,0.11163085,0.38036227,1.426808e-06,-0.27536926,0.5678659,0.53289455,-1.4932774
4,75.0,0.0,740.0,1.0,740.0,3427.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,3.4039152,1.9638362,7.991121000000001,0.21933316,1.4528251,0.16249819,1.6673243999999998,1.1318555000000001,-0.06318947109911177,0.4280228160756264,0.6142082214355469,-0.9774222373962402,-2.6464324,0.35272834,-2.2407157000000004,-3.3780959999999998,0.09359822,0.06729852,0.24981685,0.03076201,-0.096310705,0.51300126,0.43143788,-1.0997788999999998
5,113.0,0.0,755.0,1.0,755.0,4182.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,3.233333,1.986448,7.6551165999999995,0.19970839,1.2935143999999998,0.11847049,1.439909,1.0601448,-0.08421046411668932,0.4278507532658671,0.5715954303741455,-0.980087161064148,-2.3434882000000004,0.31644145,-1.9696671000000001,-3.1619172000000004,0.09507382,0.07568375,0.26689446,0.024116684,-0.1209513,0.48729447,0.5198732,-1.1845143
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Entropy/Mean Entropy/Stdev Entropy/Max Entropy/Min Advantages/Mean Advantages/Stdev Advantages/Max Advantages/Min Values/Mean Values/Stdev Values/Max Values/Min Value Loss/Mean Value Loss/Stdev Value Loss/Max Value Loss/Min Policy Loss/Mean Policy Loss/Stdev Policy Loss/Max Policy Loss/Min
2 1 0.0 1.0 881.0 1.0 881.0 881.0 0.0 0.0
3 2 0.0 1.0 1043.0 1.0 1043.0 1924.0 0.0 0.0
4 3 40.0 38.0 0.0 800.0 763.0 1.0 800.0 763.0 2724.0 2687.0 0.0 -21.0 -21.0 0.0 1.9787095000000001 1.4867063 1.6848618999999998 1.4938432 7.008404 5.887912 0.010777535 0.0015561687 1.6982583999999998 1.7584827 0.09253728 0.03291289 1.7866282 1.788348 1.4752818 1.6738943 -1.1255737884236818 -0.15541847978173265 1.011838146554653 0.4162753016651965 0.5801041126251221 0.5290131568908691 -3.300693988800049 -1.0030009746551514 -0.599011 -1.0561148 0.9902591000000001 0.93491656 0.05652909 0.021942224 -3.2763072999999996 -2.8995342 0.15230414 0.09872001400000001 0.30587232 0.11163085 1.9219400000000002 0.38036227 2.232377e-05 1.426808e-06 -0.35678893 -0.27536926 0.77703935 0.5678659 0.5012678 0.53289455 -3.3208616 -1.4932774
5 4 84.0 75.0 0.0 874.0 740.0 1.0 874.0 740.0 3598.0 3427.0 0.0 -21.0 -21.0 0.0 2.7599952 3.4039152 4.4335379999999995 1.9638362 30.457005 7.991121000000001 0.31769055 0.21933316 1.6350366000000003 1.4528251 0.04428426 0.16249819 1.6724039000000002 1.6673243999999998 1.4955306000000002 1.1318555000000001 0.012454061887480996 -0.06318947109911177 0.5316342837334819 0.4280228160756264 2.479757070541382 0.6142082214355469 -0.97082257270813 -0.9774222373962402 -1.8976341 -2.6464324 0.45726654 0.35272834 -1.5126665000000001 -2.2407157000000004 -3.3147082 -3.3780959999999998 0.14139508 0.09359822 0.42458066 0.06729852 2.8789976000000004 0.24981685 0.01374856 0.03076201 0.022890297999999996 -0.096310705 0.7258614999999999 0.51300126 3.5482929 0.43143788 -1.1909457 -1.0997788999999998
6 5 133.0 113.0 0.0 962.0 755.0 1.0 962.0 755.0 4560.0 4182.0 0.0 -21.0 -21.0 0.0 2.2067685 3.233333 1.6116456000000001 1.986448 8.973311 7.6551165999999995 0.21847898 0.19970839 1.671777 1.2935143999999998 0.014726803 0.11847049 1.6910971000000001 1.439909 1.6381558 1.0601448 -0.005908215376385918 -0.08421046411668932 0.3910391646463375 0.4278507532658671 1.0921021699905396 0.5715954303741455 -0.9658074378967284 -0.980087161064148 -1.8732144 -2.3434882000000004 0.11304277 0.31644145 -1.7097299 -1.9696671000000001 -2.2033259999999997 -3.1619172000000004 0.07926889 0.09507382 0.09812635 0.07568375 0.5330912 0.26689446 0.017862901 0.024116684 0.0026857766999999998 -0.1209513 0.55807036 0.48729447 1.739695 0.5198732 -1.2937489 -1.1845143