1
0
mirror of https://github.com/gryf/coach.git synced 2026-05-02 22:30:55 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,909.0,909.0,909.0,909.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,1849.0,1849.0,940.0,1849.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,
3,185.0,0.0,2589.0,2589.0,740.0,2589.0,0.0,-21.0,-21.0,0.0,,,,0.01237654753620862,0.012107382206937413,0.047442745417356484,0.0005760944331996143,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012748566999999999,0.0052782227,0.03766812,0.005078032,,,,
4,442.0,0.0,3618.0,3618.0,1029.0,3618.0,0.036310166120529175,-20.0,-20.0,0.0,,,,0.012453011741702252,0.013498447785635284,0.0759262964129448,0.000425029982579872,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.011709151000000001,0.0046144870000000004,0.027468227,0.0041608843,-0.0377305,0.005958211,-0.028756577999999998,-0.046951957
5,708.0,0.0,4679.0,4679.0,1061.0,4679.0,0.018979299813508987,-20.0,-20.0,0.0,,,,0.012743571282345172,0.013893963491457096,0.08593739569187164,0.0005298212054185568,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.010713641999999999,0.0042998786,0.04166126,0.005018155,-0.01665635,0.0030537121999999997,-0.0120764775,-0.02167279
3,191.0,0.0,2612.0,2612.0,763.0,2612.0,0.0,-21.0,-21.0,0.0,,,,0.0130545203534384,0.01309133522881885,0.050950419157743454,0.0006263511604629457,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.013740225,0.005418419,0.03539303,0.0058373637,,,,
4,460.0,0.0,3688.0,3688.0,1076.0,3688.0,0.04093054309487343,-19.0,-19.0,0.0,,,,0.01353417751470193,0.013991631199855073,0.060094203799963,0.0007772938697598875,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012196413,0.0043717,0.027787974,0.0044747777,-0.028111902999999997,0.007774155,-0.014797297,-0.03953904
5,679.0,0.0,4563.0,4563.0,875.0,4563.0,0.03276534005999565,-21.0,-21.0,0.0,,,,0.013156805880639923,0.01289872044109124,0.05886143445968627,0.0005613723187707366,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.012954918999999999,0.00410382,0.028127436000000002,0.0051577645,-0.006553585,0.009180349,0.011637015,-0.020697306999999998
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 909.0 909.0 909.0 909.0 0.0 0.0
3 2 0.0 1.0 1849.0 1849.0 940.0 1849.0 0.0 0.0
4 3 185.0 191.0 0.0 2589.0 2612.0 2589.0 2612.0 740.0 763.0 2589.0 2612.0 0.0 -21.0 -21.0 0.0 0.01237654753620862 0.0130545203534384 0.012107382206937413 0.01309133522881885 0.047442745417356484 0.050950419157743454 0.0005760944331996143 0.0006263511604629457 0.0002500000000000001 5.421010862427521e-20 0.00025 0.00025 0.012748566999999999 0.013740225 0.0052782227 0.005418419 0.03766812 0.03539303 0.005078032 0.0058373637
5 4 442.0 460.0 0.0 3618.0 3688.0 3618.0 3688.0 1029.0 1076.0 3618.0 3688.0 0.036310166120529175 0.04093054309487343 -20.0 -19.0 -20.0 -19.0 0.0 0.012453011741702252 0.01353417751470193 0.013498447785635284 0.013991631199855073 0.0759262964129448 0.060094203799963 0.000425029982579872 0.0007772938697598875 0.0002500000000000001 5.421010862427521e-20 0.00025 0.00025 0.011709151000000001 0.012196413 0.0046144870000000004 0.0043717 0.027468227 0.027787974 0.0041608843 0.0044747777 -0.0377305 -0.028111902999999997 0.005958211 0.007774155 -0.028756577999999998 -0.014797297 -0.046951957 -0.03953904
6 5 708.0 679.0 0.0 4679.0 4563.0 4679.0 4563.0 1061.0 875.0 4679.0 4563.0 0.018979299813508987 0.03276534005999565 -20.0 -21.0 -20.0 -21.0 0.0 0.012743571282345172 0.013156805880639923 0.013893963491457096 0.01289872044109124 0.08593739569187164 0.05886143445968627 0.0005298212054185568 0.0005613723187707366 0.0002500000000000001 5.421010862427521e-20 1.0842021724855042e-19 0.00025 0.00025 0.010713641999999999 0.012954918999999999 0.0042998786 0.00410382 0.04166126 0.028127436000000002 0.005018155 0.0051577645 -0.01665635 -0.006553585 0.0030537121999999997 0.009180349 -0.0120764775 0.011637015 -0.02167279 -0.020697306999999998