1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-18 21:53:32 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,221.0,0.0,2002.0,2002.0,885.0,2002.0,0.9992035000000262,-21.0,-21.0,0.0,,,,0.0066113567236284546,0.003946234120878863,0.016941886395215988,3.0340672310558148e-05,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.020578874,0.011285608,0.12838697,0.003849274,,,,
3,455.0,0.0,2938.0,2938.0,936.0,2938.0,0.9983611000000541,-20.0,-20.0,0.0,,,,0.007220610483817191,0.00384386883256313,0.02201320417225361,0.0004259901470504701,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.014196658000000001,0.0053113990000000005,0.040406343,0.005419724599999999,-0.012426703,0.021457887999999998,0.023741005,-0.051037904
4,659.0,0.0,3754.0,3754.0,816.0,3754.0,0.997626700000078,-21.0,-21.0,0.0,,,,0.007067595686713306,0.00349683739085928,0.016786431893706318,0.0004974190378561616,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012732236999999999,0.0038257977000000004,0.02420173,0.00600734,,,,
5,961.0,0.0,4961.0,4961.0,1207.0,4961.0,0.996540400000114,-18.0,-18.0,0.0,,,,0.007034662726550326,0.003637364351878082,0.022078890353441242,0.0004736386181320995,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012767965,0.0043293815,0.031500462,0.0063609104,-0.01521363,0.011859578,0.006441065,-0.04179345
2,197.0,0.0,1905.0,1905.0,788.0,1905.0,0.9992908000000232,-21.0,-21.0,0.0,,,,0.0051924175274927565,0.003918679938872439,0.04185768589377403,2.9565440854639746e-05,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.01784605,0.03255357,0.465425,0.0038900522,,,,
3,436.0,0.0,2862.0,2862.0,957.0,2862.0,0.9984295000000516,-20.0,-20.0,0.0,,,,0.004909432677758631,0.0024521858486776424,0.012306905351579191,0.00032079339143820107,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.0113589475,0.0037933819,0.025680352000000004,0.0035025426,0.030741736000000002,0.025549445,0.07848698,-0.02225282
4,627.0,0.0,3623.0,3623.0,761.0,3623.0,0.9977446000000744,-21.0,-21.0,0.0,,,,0.0052940571797080345,0.002501595309474277,0.012016894295811651,0.0003992373822256922,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.010990373999999999,0.0038335419,0.027035048,0.005245461,,,,
5,855.0,0.0,4535.0,4535.0,912.0,4535.0,0.9969238000001012,-20.0,-20.0,0.0,,,,0.004946799854224082,0.0024341152117377785,0.013126095756888391,0.0003701391979120672,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.010130615,0.0032620803,0.022317264,0.0045093056,0.026840469,0.01787639,0.051877695999999994,-0.005629579
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 221.0 197.0 0.0 2002.0 1905.0 2002.0 1905.0 885.0 788.0 2002.0 1905.0 0.9992035000000262 0.9992908000000232 -21.0 -21.0 0.0 0.0066113567236284546 0.0051924175274927565 0.003946234120878863 0.003918679938872439 0.016941886395215988 0.04185768589377403 3.0340672310558148e-05 2.9565440854639746e-05 0.0002500000000000001 1.0842021724855042e-19 5.421010862427521e-20 0.00025 0.00025 0.020578874 0.01784605 0.011285608 0.03255357 0.12838697 0.465425 0.003849274 0.0038900522
4 3 455.0 436.0 0.0 2938.0 2862.0 2938.0 2862.0 936.0 957.0 2938.0 2862.0 0.9983611000000541 0.9984295000000516 -20.0 -20.0 0.0 0.007220610483817191 0.004909432677758631 0.00384386883256313 0.0024521858486776424 0.02201320417225361 0.012306905351579191 0.0004259901470504701 0.00032079339143820107 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.014196658000000001 0.0113589475 0.0053113990000000005 0.0037933819 0.040406343 0.025680352000000004 0.005419724599999999 0.0035025426 -0.012426703 0.030741736000000002 0.021457887999999998 0.025549445 0.023741005 0.07848698 -0.051037904 -0.02225282
5 4 659.0 627.0 0.0 3754.0 3623.0 3754.0 3623.0 816.0 761.0 3754.0 3623.0 0.997626700000078 0.9977446000000744 -21.0 -21.0 0.0 0.007067595686713306 0.0052940571797080345 0.00349683739085928 0.002501595309474277 0.016786431893706318 0.012016894295811651 0.0004974190378561616 0.0003992373822256922 0.0002500000000000001 5.421010862427521e-20 0.00025 0.00025 0.012732236999999999 0.010990373999999999 0.0038257977000000004 0.0038335419 0.02420173 0.027035048 0.00600734 0.005245461
6 5 961.0 855.0 0.0 4961.0 4535.0 4961.0 4535.0 1207.0 912.0 4961.0 4535.0 0.996540400000114 0.9969238000001012 -18.0 -20.0 -18.0 -20.0 0.0 0.007034662726550326 0.004946799854224082 0.003637364351878082 0.0024341152117377785 0.022078890353441242 0.013126095756888391 0.0004736386181320995 0.0003701391979120672 0.0002500000000000001 5.421010862427521e-20 1.0842021724855042e-19 0.00025 0.00025 0.012767965 0.010130615 0.0043293815 0.0032620803 0.031500462 0.022317264 0.0063609104 0.0045093056 -0.01521363 0.026840469 0.011859578 0.01787639 0.006441065 0.051877695999999994 -0.04179345 -0.005629579