Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2026-07-09 02:46:33 +02:00 · 2018-09-04 15:07:54 +03:00
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
@@ -1,6 +1,6 @@
 Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
 1,0.0,1.0,986.0,986.0,986.0,986.0,7.0,,,0.0,,,,,,,,,,,,,,,,,,,
 2,0.0,1.0,1806.0,1806.0,820.0,1806.0,4.0,,,0.0,,,,,,,,,,,,,,,,,,,
-3,207.0,0.0,2634.0,2634.0,828.0,2634.0,1.0,-21.0,-21.0,0.0,,,,0.013430694482291505,0.012774117514024573,0.06467919796705246,0.0005054873763583599,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.013462509,0.005010004,0.032169305,0.0046610474,,,,
-4,433.0,0.0,3538.0,3538.0,904.0,3538.0,1.0,-21.0,-21.0,0.0,,,,0.013214294455993912,0.012243776759493771,0.048550304025411606,0.00030727600096724933,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.012283348500000001,0.004644497,0.032848116000000004,0.0047284905,,,,
-5,664.0,0.0,4462.0,4462.0,924.0,4462.0,2.0,-20.0,-20.0,0.0,,,,0.013385360111538885,0.013904787720461907,0.06079941987991332,0.0005098563269712031,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.010943641,0.0043348954,0.03260831,0.0045090048,0.00066530565,0.0129122045,0.024260167000000003,-0.034502137
+3,206.0,0.0,2629.0,2629.0,823.0,2629.0,5.0,-21.0,-21.0,0.0,,,,0.01375627432677452,0.013505330839893808,0.06677445024251938,0.0005553220980800688,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.013602738,0.0048916726,0.034245104,0.0056978124,,,,
+4,398.0,0.0,3397.0,3397.0,768.0,3397.0,3.0,-21.0,-21.0,0.0,,,,0.014156610367839068,0.013173363350960334,0.059119727462530136,0.0007080046343617141,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012839798999999999,0.0038416919,0.024480136,0.005681609000000001,,,,
+5,617.0,0.0,4274.0,4274.0,877.0,4274.0,6.0,-21.0,-21.0,0.0,,,,0.015369139484674181,0.01463229484329247,0.08113615959882736,0.0005487628513947129,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.014249632,0.005901839599999999,0.04092761,0.004881437,0.004008428,0.016476048,0.028364737,-0.026583625