1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-20 06:33:31 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,210.0,0.0,1958.0,1958.0,841.0,1958.0,0.999167410000018,-20.0,-20.0,0.0,,,,0.011338375554208012,0.012271934396749055,0.04895064979791641,4.0612991142552346e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.089302726,0.052687183,0.26947329999999997,0.008767666,,,,
3,402.0,0.0,2726.0,2726.0,768.0,2726.0,0.9984070900000346,-21.0,-21.0,0.0,,,,0.012148191395510821,0.0140916556236684,0.08563371002674103,4.236549284541979e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.07570279,0.04711025,0.3658558,0.004744183,0.0034259886,0.0050672004,0.010562051000000001,-0.004941341
4,601.0,0.0,3519.0,3519.0,793.0,3519.0,0.9976220200000516,-21.0,-21.0,0.0,,,,0.013526306782753192,0.013285856236359452,0.048545010387897485,6.407919136108829e-05,0.0001,0.0,0.0001,0.0001,0.061247554,0.032466255,0.1804012,0.009472755,-0.0073855095999999995,0.0022593734,-0.0044954764,-0.009999375999999999
5,809.0,0.0,4352.0,4352.0,833.0,4352.0,0.9967973500000696,-21.0,-21.0,0.0,,,,0.011593266384177415,0.0126054028157575,0.06050398200750351,2.748135375441052e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.058888417,0.03245456,0.21228382,0.00490136,0.050208718,0.0025627778,0.05198091,0.04461886
2,205.0,0.0,1937.0,1937.0,820.0,1937.0,0.9991882000000176,-21.0,-21.0,0.0,,,,0.014763841533212831,0.013646937264058223,0.06725655496120453,2.0758947357535362e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.17952479999999998,0.13626544,0.9860897,0.0053134440000000005,,,,
3,413.0,0.0,2768.0,2768.0,831.0,2768.0,0.9983655100000356,-21.0,-21.0,0.0,,,,0.012111850191891229,0.013912744765592264,0.08914861083030699,1.7985148588195443e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.057201855,0.04205291,0.26596984,0.0031672046,-0.04456665,0.009031756,-0.031443898,-0.059377108
4,667.0,0.0,3783.0,3783.0,1015.0,3783.0,0.9973606600000572,-20.0,-20.0,0.0,,,,0.013269104183936587,0.013449185914245043,0.07771021127700806,1.3188657248974778e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.098453455,0.109315164,0.9814589,0.0024465397,-0.008853295,0.009689603,0.0003574537,-0.028319128
5,867.0,0.0,4585.0,4585.0,802.0,4585.0,0.9965666800000744,-21.0,-21.0,0.0,,,,0.01383970570535894,0.013677503957050816,0.0817062109708786,5.106279422761872e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.108334474,0.0749226,0.40531653,0.006287096,-0.018026425,0.047121227,0.035217006,-0.070681214
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 210.0 205.0 0.0 1958.0 1937.0 1958.0 1937.0 841.0 820.0 1958.0 1937.0 0.999167410000018 0.9991882000000176 -20.0 -21.0 -20.0 -21.0 0.0 0.011338375554208012 0.014763841533212831 0.012271934396749055 0.013646937264058223 0.04895064979791641 0.06725655496120453 4.0612991142552346e-05 2.0758947357535362e-05 0.00010000000000000002 1.3552527156068802e-20 0.0001 0.0001 0.089302726 0.17952479999999998 0.052687183 0.13626544 0.26947329999999997 0.9860897 0.008767666 0.0053134440000000005
4 3 402.0 413.0 0.0 2726.0 2768.0 2726.0 2768.0 768.0 831.0 2726.0 2768.0 0.9984070900000346 0.9983655100000356 -21.0 -21.0 0.0 0.012148191395510821 0.012111850191891229 0.0140916556236684 0.013912744765592264 0.08563371002674103 0.08914861083030699 4.236549284541979e-05 1.7985148588195443e-05 0.00010000000000000003 2.7105054312137605e-20 0.0001 0.0001 0.07570279 0.057201855 0.04711025 0.04205291 0.3658558 0.26596984 0.004744183 0.0031672046 0.0034259886 -0.04456665 0.0050672004 0.009031756 0.010562051000000001 -0.031443898 -0.004941341 -0.059377108
5 4 601.0 667.0 0.0 3519.0 3783.0 3519.0 3783.0 793.0 1015.0 3519.0 3783.0 0.9976220200000516 0.9973606600000572 -21.0 -20.0 -21.0 -20.0 0.0 0.013526306782753192 0.013269104183936587 0.013285856236359452 0.013449185914245043 0.048545010387897485 0.07771021127700806 6.407919136108829e-05 1.3188657248974778e-05 0.0001 0.00010000000000000002 0.0 1.3552527156068802e-20 0.0001 0.0001 0.061247554 0.098453455 0.032466255 0.109315164 0.1804012 0.9814589 0.009472755 0.0024465397 -0.0073855095999999995 -0.008853295 0.0022593734 0.009689603 -0.0044954764 0.0003574537 -0.009999375999999999 -0.028319128
6 5 809.0 867.0 0.0 4352.0 4585.0 4352.0 4585.0 833.0 802.0 4352.0 4585.0 0.9967973500000696 0.9965666800000744 -21.0 -21.0 0.0 0.011593266384177415 0.01383970570535894 0.0126054028157575 0.013677503957050816 0.06050398200750351 0.0817062109708786 2.748135375441052e-05 5.106279422761872e-05 0.00010000000000000003 0.00010000000000000002 2.7105054312137605e-20 1.3552527156068802e-20 0.0001 0.0001 0.058888417 0.108334474 0.03245456 0.0749226 0.21228382 0.40531653 0.00490136 0.006287096 0.050208718 -0.018026425 0.0025627778 0.047121227 0.05198091 0.035217006 0.04461886 -0.070681214