1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-07 13:43:32 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,210.0,0.0,1958.0,1958.0,841.0,1958.0,0.999167410000018,-20.0,-20.0,0.0,,,,3.9302484875633596,0.000980246273321835,3.9315416812896733,3.926891326904297,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.0021740668,0.0023227779999999997,0.014739634,0.0009191924499999999,,,,
3,402.0,0.0,2726.0,2726.0,768.0,2726.0,0.9984070900000346,-21.0,-21.0,0.0,,,,3.928837850689888,0.0010547064317355432,3.9301910400390634,3.9240779876708975,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.0015834095000000002,0.0025085623000000003,0.016421815,0.0005392361,0.06382764484733404,0.02358938873903284,0.10002454034984172,0.024584384262562403
4,601.0,0.0,3519.0,3519.0,793.0,3519.0,0.9976220200000516,-21.0,-21.0,0.0,,,,3.927992107880176,0.0010090422055890882,3.929178953170776,3.925344705581665,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.0015888658,0.002871752,0.017285319,0.00037317397000000005,0.06921376281728404,0.007428921215239665,0.08169361427426401,0.05617701336741498
5,809.0,0.0,4352.0,4352.0,833.0,4352.0,0.9967973500000696,-21.0,-21.0,0.0,,,,3.928017064929009,0.0009430171418315623,3.9289817810058594,3.92388129234314,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.0016348981,0.0031759862,0.016966071,0.0003033526,0.07216475835690954,0.00436285987925874,0.07676253654062795,0.06509595513343866
2,205.0,0.0,1937.0,1937.0,820.0,1937.0,0.9991882000000176,-21.0,-21.0,0.0,,,,3.9302156448364256,0.0010496846440389027,3.931553840637207,3.9267423152923584,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.0021735325,0.0023975547,0.015546012,0.0008601941499999999,,,,
3,413.0,0.0,2768.0,2768.0,831.0,2768.0,0.9983655100000356,-21.0,-21.0,0.0,,,,3.9287387797465687,0.0010725536875668584,3.930054426193237,3.92205548286438,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.0014352581,0.0022775119,0.016661283,0.0005455515,0.06143524280438877,0.010833295539136235,0.0730189699679619,0.04586568772792873
4,667.0,0.0,3783.0,3783.0,1015.0,3783.0,0.9973606600000572,-20.0,-20.0,0.0,,,,3.9281875890071,0.0009267313904696912,3.9292049407958975,3.9252440929412837,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.0012879773,0.0025753588,0.018626466,0.00030493445,0.06362535804510176,0.0053005873567461975,0.06891775093972746,0.053885202482343325
5,892.0,0.0,4684.0,4684.0,901.0,4684.0,0.9964686700000768,-20.0,-20.0,0.0,,,,3.9280550532870815,0.0009707394231859632,3.9289817810058594,3.9241018295288086,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.00088581746,0.0017567717,0.016409054,0.00022121534999999999,0.06359539761518496,0.005375292811606972,0.07293013073504026,0.05364551693201117
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 210.0 205.0 0.0 1958.0 1937.0 1958.0 1937.0 841.0 820.0 1958.0 1937.0 0.999167410000018 0.9991882000000176 -20.0 -21.0 -20.0 -21.0 0.0 3.9302484875633596 3.9302156448364256 0.000980246273321835 0.0010496846440389027 3.9315416812896733 3.931553840637207 3.926891326904297 3.9267423152923584 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.0021740668 0.0021735325 0.0023227779999999997 0.0023975547 0.014739634 0.015546012 0.0009191924499999999 0.0008601941499999999
4 3 402.0 413.0 0.0 2726.0 2768.0 2726.0 2768.0 768.0 831.0 2726.0 2768.0 0.9984070900000346 0.9983655100000356 -21.0 -21.0 0.0 3.928837850689888 3.9287387797465687 0.0010547064317355432 0.0010725536875668584 3.9301910400390634 3.930054426193237 3.9240779876708975 3.92205548286438 0.0002500000000000001 5.421010862427521e-20 1.0842021724855042e-19 0.00025 0.00025 0.0015834095000000002 0.0014352581 0.0025085623000000003 0.0022775119 0.016421815 0.016661283 0.0005392361 0.0005455515 0.06382764484733404 0.06143524280438877 0.02358938873903284 0.010833295539136235 0.10002454034984172 0.0730189699679619 0.024584384262562403 0.04586568772792873
5 4 601.0 667.0 0.0 3519.0 3783.0 3519.0 3783.0 793.0 1015.0 3519.0 3783.0 0.9976220200000516 0.9973606600000572 -21.0 -20.0 -21.0 -20.0 0.0 3.927992107880176 3.9281875890071 0.0010090422055890882 0.0009267313904696912 3.929178953170776 3.9292049407958975 3.925344705581665 3.9252440929412837 0.0002500000000000001 5.421010862427521e-20 0.00025 0.00025 0.0015888658 0.0012879773 0.002871752 0.0025753588 0.017285319 0.018626466 0.00037317397000000005 0.00030493445 0.06921376281728404 0.06362535804510176 0.007428921215239665 0.0053005873567461975 0.08169361427426401 0.06891775093972746 0.05617701336741498 0.053885202482343325
6 5 809.0 892.0 0.0 4352.0 4684.0 4352.0 4684.0 833.0 901.0 4352.0 4684.0 0.9967973500000696 0.9964686700000768 -21.0 -20.0 -21.0 -20.0 0.0 3.928017064929009 3.9280550532870815 0.0009430171418315623 0.0009707394231859632 3.9289817810058594 3.92388129234314 3.9241018295288086 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.0016348981 0.00088581746 0.0031759862 0.0017567717 0.016966071 0.016409054 0.0003033526 0.00022121534999999999 0.07216475835690954 0.06359539761518496 0.00436285987925874 0.005375292811606972 0.07676253654062795 0.07293013073504026 0.06509595513343866 0.05364551693201117