1
0
mirror of https://github.com/gryf/coach.git synced 2026-03-11 03:55:52 +01:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min
1,0.0,1.0,1000.0,1.0,1000.0,1000.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,2000.0,2.0,1000.0,2000.0,0.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,999.0,0.0,3000.0,3.0,1000.0,3000.0,-0.017666830179174003,0.0,0.0,1.0,,,,0.0038191572866389523,0.0037619606969040414,0.026500405743718147,0.0005351771251298487,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.14963633,0.12557492,1.0296046,0.022291046000000002,,,,,,,,,,,,,,,,,,,,,-0.06052997,0.07319117,0.09228404,-0.81788486,-0.2878782119320924,0.18290294876848567,0.15647711277008056,-1.5949552059173584,0.001995146476856135,0.7060122989726414,1.2513357156871705,-1.2238670209506697
4,1999.0,0.0,4000.0,4.0,1000.0,4000.0,-0.039999362478752916,0.0,0.0,1.0,,,,0.0005999824570681085,0.0005200396841794251,0.007392321713268756,0.0001155680656665936,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.035074446,0.023916507000000004,0.27376664,0.007415438000000001,,,,,,,,,,,,,,,,,,,,,-0.055866152,0.03557198,0.048398294,-0.20761846,-0.1228111611005996,0.10064295523824936,0.10938632614910604,-0.9385637390613556,0.3813050626909247,0.8586455988935526,1.9421025144380244,-1.2856749345811207
5,2999.0,0.0,5000.0,5.0,1000.0,5000.0,0.17145601483403705,0.0,0.0,0.0,,,,0.00016860231386453962,8.468335419663504e-05,0.0008860444650053977,4.115650153835304e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.013316613,0.0064833634999999995,0.055022966,0.0038480079,,,,,,,,,,,,,,,,,,,,,-0.04638309,0.028830105,-0.0011850878,-0.36706513,-0.055944047987441965,0.061761207984293,0.16325088679790498,-0.6333049428462982,0.2629062319462116,0.7653580979608734,1.6618100999356682,-1.1699750612176198
3,999.0,0.0,3000.0,3.0,1000.0,3000.0,-0.017666830179174003,0.0,0.0,1.0,,,,0.005126546850151572,0.004660130005352106,0.05132860690355301,0.0005627279169857502,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.1678458,0.12852536,1.5926425,0.024081124,,,,,,,,,,,,,,,,,,,,,0.24388833,0.11236252,0.42840713,-0.8727883,-0.01800971130044855,0.1942566799457346,0.5385999780893326,-0.9172834092378616,-0.11966089038501045,0.8962365587209448,1.3716363433793126,-1.5680451743766328
4,1999.0,0.0,4000.0,4.0,1000.0,4000.0,-0.039999362478752916,0.0,0.0,1.0,,,,0.0008180646479820358,0.000529273102626917,0.0054473504424095145,0.00014673141413368285,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.0469651,0.025094092000000002,0.22221590000000002,0.010784525,,,,,,,,,,,,,,,,,,,,,0.14337498,0.17592207,0.33719423,-0.28446856,0.20208258858056294,0.13431578391837634,0.5768654608726501,-0.5833876812458039,-0.2705900161928217,0.9272508528236816,1.9572209345620784,-1.4727463554915825
5,2999.0,0.0,5000.0,5.0,1000.0,5000.0,0.17145601483403705,0.0,0.0,0.0,,,,0.0003958249435308753,0.00031769597300822634,0.0040870513767004004,0.00010442566417623311,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.025218817999999997,0.013975793500000002,0.14064097,0.0070197446999999994,,,,,,,,,,,,,,,,,,,,,-0.04435015,0.030164617999999997,0.124313995,-0.2207815,0.26081367032274555,0.12723809202247516,0.5931554335355759,-0.2620022776722908,-0.2959362262223715,0.6939703144112135,1.0669463809202309,-1.416814717430604
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Entropy/Mean Entropy/Stdev Entropy/Max Entropy/Min Advantages/Mean Advantages/Stdev Advantages/Max Advantages/Min Values/Mean Values/Stdev Values/Max Values/Min Value Loss/Mean Value Loss/Stdev Value Loss/Max Value Loss/Min Policy Loss/Mean Policy Loss/Stdev Policy Loss/Max Policy Loss/Min Q/Mean Q/Stdev Q/Max Q/Min TD targets/Mean TD targets/Stdev TD targets/Max TD targets/Min actions/Mean actions/Stdev actions/Max actions/Min
2 1 0.0 1.0 1000.0 1.0 1000.0 1000.0 0.0 0.0
3 2 0.0 1.0 2000.0 2.0 1000.0 2000.0 0.0 1.0
4 3 999.0 0.0 3000.0 3.0 1000.0 3000.0 -0.017666830179174003 0.0 0.0 1.0 0.0038191572866389523 0.005126546850151572 0.0037619606969040414 0.004660130005352106 0.026500405743718147 0.05132860690355301 0.0005351771251298487 0.0005627279169857502 0.00010000000000000003 4.0657581468206416e-20 0.0001 0.0001 0.14963633 0.1678458 0.12557492 0.12852536 1.0296046 1.5926425 0.022291046000000002 0.024081124 -0.06052997 0.24388833 0.07319117 0.11236252 0.09228404 0.42840713 -0.81788486 -0.8727883 -0.2878782119320924 -0.01800971130044855 0.18290294876848567 0.1942566799457346 0.15647711277008056 0.5385999780893326 -1.5949552059173584 -0.9172834092378616 0.001995146476856135 -0.11966089038501045 0.7060122989726414 0.8962365587209448 1.2513357156871705 1.3716363433793126 -1.2238670209506697 -1.5680451743766328
5 4 1999.0 0.0 4000.0 4.0 1000.0 4000.0 -0.039999362478752916 0.0 0.0 1.0 0.0005999824570681085 0.0008180646479820358 0.0005200396841794251 0.000529273102626917 0.007392321713268756 0.0054473504424095145 0.0001155680656665936 0.00014673141413368285 0.00010000000000000003 2.7105054312137605e-20 4.0657581468206416e-20 0.0001 0.0001 0.035074446 0.0469651 0.023916507000000004 0.025094092000000002 0.27376664 0.22221590000000002 0.007415438000000001 0.010784525 -0.055866152 0.14337498 0.03557198 0.17592207 0.048398294 0.33719423 -0.20761846 -0.28446856 -0.1228111611005996 0.20208258858056294 0.10064295523824936 0.13431578391837634 0.10938632614910604 0.5768654608726501 -0.9385637390613556 -0.5833876812458039 0.3813050626909247 -0.2705900161928217 0.8586455988935526 0.9272508528236816 1.9421025144380244 1.9572209345620784 -1.2856749345811207 -1.4727463554915825
6 5 2999.0 0.0 5000.0 5.0 1000.0 5000.0 0.17145601483403705 0.0 0.0 0.0 0.00016860231386453962 0.0003958249435308753 8.468335419663504e-05 0.00031769597300822634 0.0008860444650053977 0.0040870513767004004 4.115650153835304e-05 0.00010442566417623311 0.00010000000000000003 2.7105054312137605e-20 4.0657581468206416e-20 0.0001 0.0001 0.013316613 0.025218817999999997 0.0064833634999999995 0.013975793500000002 0.055022966 0.14064097 0.0038480079 0.0070197446999999994 -0.04638309 -0.04435015 0.028830105 0.030164617999999997 -0.0011850878 0.124313995 -0.36706513 -0.2207815 -0.055944047987441965 0.26081367032274555 0.061761207984293 0.12723809202247516 0.16325088679790498 0.5931554335355759 -0.6333049428462982 -0.2620022776722908 0.2629062319462116 -0.2959362262223715 0.7653580979608734 0.6939703144112135 1.6618100999356682 1.0669463809202309 -1.1699750612176198 -1.416814717430604