1
0
mirror of https://github.com/gryf/coach.git synced 2026-03-16 22:53:37 +01:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,Q Values/Mean,Q Values/Stdev,Q Values/Max,Q Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min
1,0.0,1.0,1117.0,1.0,1117.0,1117.0,0.5,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,164.0,0.0,819.0,1.0,819.0,1936.0,0.4919737999999965,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.08973327,0.10308977,0.35256127,-0.28368974,0.09622141,0.28172967,2.938948,3.398415e-05
3,348.0,0.0,920.0,1.0,920.0,2856.0,0.4829577999999926,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.085431345,0.038046483,0.26499447,-0.00268347,0.102118276,0.23871702,1.2644383999999997,2.8306908e-05
4,517.0,0.0,843.0,1.0,843.0,3699.0,0.474696399999989,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.12869294,0.02720576,0.2113828,0.057547163,0.07727617,0.19602461,1.3975663,0.0008099895
5,700.0,0.0,913.0,1.0,913.0,4612.0,0.4657489999999852,-20.0,-20.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.16959693,0.053117845,0.3296745,0.055682946,0.04773866400000001,0.14551932,1.2484678999999999,7.813723e-06
2,166.0,0.0,834.0,1.0,834.0,1951.0,0.4918267999999965,-20.0,-20.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,-0.049309142,0.05955426,0.11067552,-0.31385273,0.10965226,0.25779134,0.96019816,1.650419e-05
3,343.0,0.0,883.0,1.0,883.0,2834.0,0.4831733999999927,-20.0,-20.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,0.00039612237,0.022137828,0.047817677,-0.057933766,0.05449706,0.15011412,0.8670572,0.0013089271000000001
4,495.0,0.0,759.0,1.0,759.0,3593.0,0.4757351999999895,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,-0.013107545,0.014792551000000001,0.02346693,-0.051909205,0.09606385,0.22936918,0.84131515,0.00357637
5,646.0,0.0,755.0,1.0,755.0,4348.0,0.4683361999999863,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,-0.056291025,0.024121637999999997,0.011681341000000001,-0.11741245,0.111964785,0.24955077,0.79120165,0.0038064622999999997
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Entropy/Mean Entropy/Stdev Entropy/Max Entropy/Min Q/Mean Q/Stdev Q/Max Q/Min Q Values/Mean Q Values/Stdev Q Values/Max Q Values/Min Value Loss/Mean Value Loss/Stdev Value Loss/Max Value Loss/Min
2 1 0.0 1.0 1117.0 1.0 1117.0 1117.0 0.5 0.0
3 2 164.0 166.0 0.0 819.0 834.0 1.0 819.0 834.0 1936.0 1951.0 0.4919737999999965 0.4918267999999965 -21.0 -20.0 -21.0 -20.0 0.0 0.08973327 -0.049309142 0.10308977 0.05955426 0.35256127 0.11067552 -0.28368974 -0.31385273 0.09622141 0.10965226 0.28172967 0.25779134 2.938948 0.96019816 3.398415e-05 1.650419e-05
4 3 348.0 343.0 0.0 920.0 883.0 1.0 920.0 883.0 2856.0 2834.0 0.4829577999999926 0.4831733999999927 -21.0 -20.0 -21.0 -20.0 0.0 0.085431345 0.00039612237 0.038046483 0.022137828 0.26499447 0.047817677 -0.00268347 -0.057933766 0.102118276 0.05449706 0.23871702 0.15011412 1.2644383999999997 0.8670572 2.8306908e-05 0.0013089271000000001
5 4 517.0 495.0 0.0 843.0 759.0 1.0 843.0 759.0 3699.0 3593.0 0.474696399999989 0.4757351999999895 -21.0 -21.0 0.0 0.12869294 -0.013107545 0.02720576 0.014792551000000001 0.2113828 0.02346693 0.057547163 -0.051909205 0.07727617 0.09606385 0.19602461 0.22936918 1.3975663 0.84131515 0.0008099895 0.00357637
6 5 700.0 646.0 0.0 913.0 755.0 1.0 913.0 755.0 4612.0 4348.0 0.4657489999999852 0.4683361999999863 -20.0 -21.0 -20.0 -21.0 0.0 0.16959693 -0.056291025 0.053117845 0.024121637999999997 0.3296745 0.011681341000000001 0.055682946 -0.11741245 0.04773866400000001 0.111964785 0.14551932 0.24955077 1.2484678999999999 0.79120165 7.813723e-06 0.0038064622999999997