1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-07 21:53:33 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,986.0,986.0,986.0,986.0,7.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,1806.0,1806.0,820.0,1806.0,4.0,,,0.0,,,,,,,,,,,,,,,,,,,
3,207.0,0.0,2634.0,2634.0,828.0,2634.0,1.0,-21.0,-21.0,0.0,,,,0.013430694482291505,0.012774117514024573,0.06467919796705246,0.0005054873763583599,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.013462509,0.005010004,0.032169305,0.0046610474,,,,
4,433.0,0.0,3538.0,3538.0,904.0,3538.0,1.0,-21.0,-21.0,0.0,,,,0.013214294455993912,0.012243776759493771,0.048550304025411606,0.00030727600096724933,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.012283348500000001,0.004644497,0.032848116000000004,0.0047284905,,,,
5,664.0,0.0,4462.0,4462.0,924.0,4462.0,2.0,-20.0,-20.0,0.0,,,,0.013385360111538885,0.013904787720461907,0.06079941987991332,0.0005098563269712031,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.010943641,0.0043348954,0.03260831,0.0045090048,0.00066530565,0.0129122045,0.024260167000000003,-0.034502137
3,206.0,0.0,2629.0,2629.0,823.0,2629.0,5.0,-21.0,-21.0,0.0,,,,0.01375627432677452,0.013505330839893808,0.06677445024251938,0.0005553220980800688,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.013602738,0.0048916726,0.034245104,0.0056978124,,,,
4,398.0,0.0,3397.0,3397.0,768.0,3397.0,3.0,-21.0,-21.0,0.0,,,,0.014156610367839068,0.013173363350960334,0.059119727462530136,0.0007080046343617141,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.012839798999999999,0.0038416919,0.024480136,0.005681609000000001,,,,
5,617.0,0.0,4274.0,4274.0,877.0,4274.0,6.0,-21.0,-21.0,0.0,,,,0.015369139484674181,0.01463229484329247,0.08113615959882736,0.0005487628513947129,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.014249632,0.005901839599999999,0.04092761,0.004881437,0.004008428,0.016476048,0.028364737,-0.026583625
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 986.0 986.0 986.0 986.0 7.0 0.0
3 2 0.0 1.0 1806.0 1806.0 820.0 1806.0 4.0 0.0
4 3 207.0 206.0 0.0 2634.0 2629.0 2634.0 2629.0 828.0 823.0 2634.0 2629.0 1.0 5.0 -21.0 -21.0 0.0 0.013430694482291505 0.01375627432677452 0.012774117514024573 0.013505330839893808 0.06467919796705246 0.06677445024251938 0.0005054873763583599 0.0005553220980800688 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.013462509 0.013602738 0.005010004 0.0048916726 0.032169305 0.034245104 0.0046610474 0.0056978124
5 4 433.0 398.0 0.0 3538.0 3397.0 3538.0 3397.0 904.0 768.0 3538.0 3397.0 1.0 3.0 -21.0 -21.0 0.0 0.013214294455993912 0.014156610367839068 0.012243776759493771 0.013173363350960334 0.048550304025411606 0.059119727462530136 0.00030727600096724933 0.0007080046343617141 0.0002500000000000001 1.0842021724855042e-19 5.421010862427521e-20 0.00025 0.00025 0.012283348500000001 0.012839798999999999 0.004644497 0.0038416919 0.032848116000000004 0.024480136 0.0047284905 0.005681609000000001
6 5 664.0 617.0 0.0 4462.0 4274.0 4462.0 4274.0 924.0 877.0 4462.0 4274.0 2.0 6.0 -20.0 -21.0 -20.0 -21.0 0.0 0.013385360111538885 0.015369139484674181 0.013904787720461907 0.01463229484329247 0.06079941987991332 0.08113615959882736 0.0005098563269712031 0.0005487628513947129 0.0002500000000000001 1.0842021724855042e-19 0.00025 0.00025 0.010943641 0.014249632 0.0043348954 0.005901839599999999 0.03260831 0.04092761 0.0045090048 0.004881437 0.00066530565 0.004008428 0.0129122045 0.016476048 0.024260167000000003 0.028364737 -0.034502137 -0.026583625