Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2026-05-08 01:02:58 +02:00 · 2018-09-04 15:07:54 +03:00
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
@@ -1,6 +1,6 @@
 Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
 1,0.0,1.0,909.0,909.0,909.0,909.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,
 2,0.0,1.0,1849.0,1849.0,940.0,1849.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,
-3,185.0,0.0,2589.0,2589.0,740.0,2589.0,0.0,-21.0,-21.0,0.0,,,,0.01237654753620862,0.012107382206937413,0.047442745417356484,0.0005760944331996143,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012748566999999999,0.0052782227,0.03766812,0.005078032,,,,
-4,442.0,0.0,3618.0,3618.0,1029.0,3618.0,0.036310166120529175,-20.0,-20.0,0.0,,,,0.012453011741702252,0.013498447785635284,0.0759262964129448,0.000425029982579872,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.011709151000000001,0.0046144870000000004,0.027468227,0.0041608843,-0.0377305,0.005958211,-0.028756577999999998,-0.046951957
-5,708.0,0.0,4679.0,4679.0,1061.0,4679.0,0.018979299813508987,-20.0,-20.0,0.0,,,,0.012743571282345172,0.013893963491457096,0.08593739569187164,0.0005298212054185568,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.010713641999999999,0.0042998786,0.04166126,0.005018155,-0.01665635,0.0030537121999999997,-0.0120764775,-0.02167279
+3,191.0,0.0,2612.0,2612.0,763.0,2612.0,0.0,-21.0,-21.0,0.0,,,,0.0130545203534384,0.01309133522881885,0.050950419157743454,0.0006263511604629457,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.013740225,0.005418419,0.03539303,0.0058373637,,,,
+4,460.0,0.0,3688.0,3688.0,1076.0,3688.0,0.04093054309487343,-19.0,-19.0,0.0,,,,0.01353417751470193,0.013991631199855073,0.060094203799963,0.0007772938697598875,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012196413,0.0043717,0.027787974,0.0044747777,-0.028111902999999997,0.007774155,-0.014797297,-0.03953904
+5,679.0,0.0,4563.0,4563.0,875.0,4563.0,0.03276534005999565,-21.0,-21.0,0.0,,,,0.013156805880639923,0.01289872044109124,0.05886143445968627,0.0005613723187707366,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.012954918999999999,0.00410382,0.028127436000000002,0.0051577645,-0.006553585,0.009180349,0.011637015,-0.020697306999999998