1
0
mirror of https://github.com/gryf/coach.git synced 2026-03-23 11:03:32 +01:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min
1,0.0,1.0,1001.0,1.0,1001.0,1001.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,2002.0,2.0,1001.0,2002.0,0.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,1000.0,0.0,3003.0,3.0,1001.0,3003.0,-0.1185302492771778,8.62704551591294,86.2704551591296,1.0,,,,1.0509011072599606e-05,4.393642656353033e-05,0.0008535402594134213,1.1514939615153708e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.004000389,0.00447183,0.062234186,0.00047969296999999996,,,,,,,,,,,,,,,,,,,,,0.08464705,0.16014087,0.45386302,-0.26037258,0.01247160570665026,0.02153857694844653,0.08672064238048882,-0.04962609781241383,0.3359349988514577,0.6368093944604776,1.3638484370927098,-1.3839266445045957
4,2001.0,0.0,4004.0,4.0,1001.0,4004.0,-0.2048510260598676,17.580070175231974,175.80070175231998,1.0,,,,0.0005509343815205071,0.0018491137578482792,0.023759014904499054,5.607626462733606e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.045537997000000004,0.09140324,1.2210321000000002,0.0010273910000000001,,,,,,,,,,,,,,,,,,,,,0.1922657,0.16243528,0.44480476,-0.2532415,0.03993582413609073,0.11728732960908478,0.5736919507147175,-0.26410636501093465,0.6924021347523865,0.5892731229023225,1.3749280698542792,-1.507436630113174
5,3002.0,0.0,5005.0,5.0,1001.0,5005.0,-0.02134772535498328,13.124325999088368,131.24325999088364,0.0,,,,0.0001703916229396802,0.000568676102611858,0.004801726434379816,2.6488642106414773e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.014244637,0.014174069,0.10748595,0.00069606147,,,,,,,,,,,,,,,,,,,,,0.38734838,0.23498419,0.6344281,-0.10678842,0.09845966999879296,0.17017726714756395,0.6471482681083021,-0.23208531499469515,0.8583268163158988,0.5493396564055796,1.4005169604031336,-1.084873489999208
3,1000.0,0.0,3003.0,3.0,1001.0,3003.0,-0.1185302492771778,12.73546386992824,127.3546386992823,1.0,,,,1.4965620654038504e-05,3.650858260133972e-05,0.0007415295694954692,1.510996071374393e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.0049991608,0.004320714,0.029555712,0.00052324863,,,,,,,,,,,,,,,,,,,,,-0.02509604,0.12253879,0.19679643,-0.25691667,0.002979598431912616,0.042334642053058036,0.09477341320020807,-0.1277348208010601,0.7574106673312205,0.2820158065549947,1.3628734977284602,-0.13561528749852786
4,2001.0,0.0,4004.0,4.0,1001.0,4004.0,-0.2048510260598676,7.629510433822026,76.29510433822016,1.0,,,,9.294460378555413e-05,0.00018001446184314637,0.0014042556285858154,1.88643639376096e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.018415965,0.019779565,0.19278607,0.0008359549000000001,,,,,,,,,,,,,,,,,,,,,-0.007871467,0.112213835,0.17972693,-0.23277566,0.002690244749371092,0.08247475995739656,0.20102381350942625,-0.2633941138081878,0.8866818175665385,0.1980599181751808,1.3750565774684147,0.4541525586846937
5,3002.0,0.0,5005.0,5.0,1001.0,5005.0,-0.02134772535498328,7.612595851248884,76.12595851248874,0.0,,,,4.2167748756014586e-05,0.00010527586086637082,0.001020723837427795,1.4967686183808837e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.009330036,0.0073557219999999994,0.04758695,0.00048721785000000004,,,,,,,,,,,,,,,,,,,,,-0.00076002936,0.12163509,0.17490079,-0.2400235,0.009237181711633144,0.09619469158143916,0.21206437128683006,-0.2783133662129137,1.0669116245649553,0.12577960072670955,1.400521073123623,0.7853534082159903
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Entropy/Mean Entropy/Stdev Entropy/Max Entropy/Min Advantages/Mean Advantages/Stdev Advantages/Max Advantages/Min Values/Mean Values/Stdev Values/Max Values/Min Value Loss/Mean Value Loss/Stdev Value Loss/Max Value Loss/Min Policy Loss/Mean Policy Loss/Stdev Policy Loss/Max Policy Loss/Min Q/Mean Q/Stdev Q/Max Q/Min TD targets/Mean TD targets/Stdev TD targets/Max TD targets/Min actions/Mean actions/Stdev actions/Max actions/Min
2 1 0.0 1.0 1001.0 1.0 1001.0 1001.0 0.0 0.0
3 2 0.0 1.0 2002.0 2.0 1001.0 2002.0 0.0 1.0
4 3 1000.0 0.0 3003.0 3.0 1001.0 3003.0 -0.1185302492771778 8.62704551591294 12.73546386992824 86.2704551591296 127.3546386992823 1.0 1.0509011072599606e-05 1.4965620654038504e-05 4.393642656353033e-05 3.650858260133972e-05 0.0008535402594134213 0.0007415295694954692 1.1514939615153708e-06 1.510996071374393e-06 0.00010000000000000003 2.7105054312137605e-20 0.0001 0.0001 0.004000389 0.0049991608 0.00447183 0.004320714 0.062234186 0.029555712 0.00047969296999999996 0.00052324863 0.08464705 -0.02509604 0.16014087 0.12253879 0.45386302 0.19679643 -0.26037258 -0.25691667 0.01247160570665026 0.002979598431912616 0.02153857694844653 0.042334642053058036 0.08672064238048882 0.09477341320020807 -0.04962609781241383 -0.1277348208010601 0.3359349988514577 0.7574106673312205 0.6368093944604776 0.2820158065549947 1.3638484370927098 1.3628734977284602 -1.3839266445045957 -0.13561528749852786
5 4 2001.0 0.0 4004.0 4.0 1001.0 4004.0 -0.2048510260598676 17.580070175231974 7.629510433822026 175.80070175231998 76.29510433822016 1.0 0.0005509343815205071 9.294460378555413e-05 0.0018491137578482792 0.00018001446184314637 0.023759014904499054 0.0014042556285858154 5.607626462733606e-06 1.88643639376096e-06 0.00010000000000000003 2.7105054312137605e-20 0.0001 0.0001 0.045537997000000004 0.018415965 0.09140324 0.019779565 1.2210321000000002 0.19278607 0.0010273910000000001 0.0008359549000000001 0.1922657 -0.007871467 0.16243528 0.112213835 0.44480476 0.17972693 -0.2532415 -0.23277566 0.03993582413609073 0.002690244749371092 0.11728732960908478 0.08247475995739656 0.5736919507147175 0.20102381350942625 -0.26410636501093465 -0.2633941138081878 0.6924021347523865 0.8866818175665385 0.5892731229023225 0.1980599181751808 1.3749280698542792 1.3750565774684147 -1.507436630113174 0.4541525586846937
6 5 3002.0 0.0 5005.0 5.0 1001.0 5005.0 -0.02134772535498328 13.124325999088368 7.612595851248884 131.24325999088364 76.12595851248874 0.0 0.0001703916229396802 4.2167748756014586e-05 0.000568676102611858 0.00010527586086637082 0.004801726434379816 0.001020723837427795 2.6488642106414773e-06 1.4967686183808837e-06 0.00010000000000000003 2.7105054312137605e-20 0.0001 0.0001 0.014244637 0.009330036 0.014174069 0.0073557219999999994 0.10748595 0.04758695 0.00069606147 0.00048721785000000004 0.38734838 -0.00076002936 0.23498419 0.12163509 0.6344281 0.17490079 -0.10678842 -0.2400235 0.09845966999879296 0.009237181711633144 0.17017726714756395 0.09619469158143916 0.6471482681083021 0.21206437128683006 -0.23208531499469515 -0.2783133662129137 0.8583268163158988 1.0669116245649553 0.5493396564055796 0.12577960072670955 1.4005169604031336 1.400521073123623 -1.084873489999208 0.7853534082159903