1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-20 15:11:24 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,221.0,0.0,2002.0,2002.0,885.0,2002.0,0.999123850000019,-21.0,-21.0,0.0,,,,0.006624795104714366,0.00394576811971849,0.01863841339945793,6.383289291989058e-05,6.250000000000003e-05,2.7105054312137605e-20,6.25e-05,6.25e-05,0.032127135,0.014603343000000001,0.12838697,0.005512589,,,,
3,455.0,0.0,2938.0,2938.0,936.0,2938.0,0.9981972100000392,-20.0,-20.0,0.0,,,,0.006993958523544746,0.0031627418936934102,0.01826494000852108,0.000633664894849062,6.250000000000003e-05,2.7105054312137605e-20,6.25e-05,6.25e-05,0.026382675,0.010049541,0.06018944,0.009578557,-0.08102258,0.054663535,-0.0028564844,-0.15667786
4,659.0,0.0,3754.0,3754.0,816.0,3754.0,0.9973893700000568,-21.0,-21.0,0.0,,,,0.00653242061713065,0.0030014368076197325,0.014597361907362938,3.4910688555100926e-05,6.250000000000001e-05,1.3552527156068802e-20,6.25e-05,6.25e-05,0.019908648,0.0060336159999999995,0.03786578,0.003926692,,,,
5,906.0,0.0,4739.0,4739.0,985.0,4739.0,0.9964142200000778,-20.0,-20.0,0.0,,,,0.005325366398493989,0.00258031872854336,0.01823988556861877,6.391682836692779e-05,6.250000000000003e-05,2.7105054312137605e-20,6.25e-05,6.25e-05,0.016708475,0.006444646,0.051227405999999996,0.0036940586,-0.042256642000000004,0.010646114,-0.030611286,-0.06712968
2,197.0,0.0,1905.0,1905.0,788.0,1905.0,0.9992198800000168,-21.0,-21.0,0.0,,,,0.0065035605150511894,0.004365216942868011,0.04185768589377403,1.6300582501571625e-05,6.250000000000001e-05,1.3552527156068802e-20,6.25e-05,6.25e-05,0.04899958,0.035690054,0.465425,0.0031771401,,,,
3,436.0,0.0,2862.0,2862.0,957.0,2862.0,0.9982724500000376,-20.0,-20.0,0.0,,,,0.006882304690776307,0.0032755384482328074,0.018768906593322757,0.00028316525276750326,6.250000000000003e-05,2.7105054312137605e-20,6.25e-05,6.25e-05,0.037334877999999995,0.016123397,0.11000781,0.007852386,-0.25035575,0.057181817,-0.1695276,-0.34914327
4,627.0,0.0,3623.0,3623.0,761.0,3623.0,0.997519060000054,-21.0,-21.0,0.0,,,,0.004881470595769075,0.0024654802506201947,0.01351994462311268,3.340750481584109e-05,6.250000000000001e-05,1.3552527156068802e-20,6.25e-05,6.25e-05,0.028977735,0.016445445,0.09510474,0.0037849140000000003,,,,
5,855.0,0.0,4535.0,4535.0,912.0,4535.0,0.9966161800000736,-20.0,-20.0,0.0,,,,0.004249975731765612,0.0017149519969122415,0.01000758446753025,5.5568867537658655e-05,6.250000000000003e-05,2.7105054312137605e-20,6.25e-05,6.25e-05,0.020409843,0.013720203,0.084716946,0.005521884,-0.11609744,0.011784006000000001,-0.10053374,-0.13682899
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 221.0 197.0 0.0 2002.0 1905.0 2002.0 1905.0 885.0 788.0 2002.0 1905.0 0.999123850000019 0.9992198800000168 -21.0 -21.0 0.0 0.006624795104714366 0.0065035605150511894 0.00394576811971849 0.004365216942868011 0.01863841339945793 0.04185768589377403 6.383289291989058e-05 1.6300582501571625e-05 6.250000000000003e-05 6.250000000000001e-05 2.7105054312137605e-20 1.3552527156068802e-20 6.25e-05 6.25e-05 0.032127135 0.04899958 0.014603343000000001 0.035690054 0.12838697 0.465425 0.005512589 0.0031771401
4 3 455.0 436.0 0.0 2938.0 2862.0 2938.0 2862.0 936.0 957.0 2938.0 2862.0 0.9981972100000392 0.9982724500000376 -20.0 -20.0 0.0 0.006993958523544746 0.006882304690776307 0.0031627418936934102 0.0032755384482328074 0.01826494000852108 0.018768906593322757 0.000633664894849062 0.00028316525276750326 6.250000000000003e-05 2.7105054312137605e-20 6.25e-05 6.25e-05 0.026382675 0.037334877999999995 0.010049541 0.016123397 0.06018944 0.11000781 0.009578557 0.007852386 -0.08102258 -0.25035575 0.054663535 0.057181817 -0.0028564844 -0.1695276 -0.15667786 -0.34914327
5 4 659.0 627.0 0.0 3754.0 3623.0 3754.0 3623.0 816.0 761.0 3754.0 3623.0 0.9973893700000568 0.997519060000054 -21.0 -21.0 0.0 0.00653242061713065 0.004881470595769075 0.0030014368076197325 0.0024654802506201947 0.014597361907362938 0.01351994462311268 3.4910688555100926e-05 3.340750481584109e-05 6.250000000000001e-05 1.3552527156068802e-20 6.25e-05 6.25e-05 0.019908648 0.028977735 0.0060336159999999995 0.016445445 0.03786578 0.09510474 0.003926692 0.0037849140000000003
6 5 906.0 855.0 0.0 4739.0 4535.0 4739.0 4535.0 985.0 912.0 4739.0 4535.0 0.9964142200000778 0.9966161800000736 -20.0 -20.0 0.0 0.005325366398493989 0.004249975731765612 0.00258031872854336 0.0017149519969122415 0.01823988556861877 0.01000758446753025 6.391682836692779e-05 5.5568867537658655e-05 6.250000000000003e-05 2.7105054312137605e-20 6.25e-05 6.25e-05 0.016708475 0.020409843 0.006444646 0.013720203 0.051227405999999996 0.084716946 0.0036940586 0.005521884 -0.042256642000000004 -0.11609744 0.010646114 0.011784006000000001 -0.030611286 -0.10053374 -0.06712968 -0.13682899