1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-19 06:03:32 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min
1,0.0,1.0,1117.0,1117.0,1117.0,1117.0,1.0,,,0.0,,,,,,,,,,,,,,,,,,,
2,210.0,0.0,1958.0,1958.0,841.0,1958.0,0.9992431000000248,-20.0,-20.0,0.0,,,,0.011158549779723952,0.01233800718156463,0.04892086610198021,7.747895870124921e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.07765737,0.051450502,0.27204409999999996,0.016480377,,,,
3,402.0,0.0,2726.0,2726.0,768.0,2726.0,0.9985519000000476,-21.0,-21.0,0.0,,,,0.011682878495226607,0.013976986698806206,0.07550939172506332,3.554971408448182e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.054967567,0.03760215,0.23677647,0.007137654300000001,0.059924055,0.010001821999999999,0.070588365,0.045257278
4,601.0,0.0,3519.0,3519.0,793.0,3519.0,0.9978382000000712,-21.0,-21.0,0.0,,,,0.013331305195076387,0.013162853602752194,0.0471726730465889,9.195879101753236e-05,0.0001,0.0,0.0001,0.0001,0.05391158,0.02641614,0.14699543,0.017958568,0.038910400000000005,0.006119223000000001,0.046009037999999995,0.030036567000000004
5,837.0,0.0,4466.0,4466.0,947.0,4466.0,0.9969859000000992,-20.0,-20.0,0.0,,,,0.011204646104627085,0.012869155071181351,0.06053701043128967,6.284505798248574e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.047131248,0.026914247999999998,0.13275696,0.010900318999999999,,,,
2,205.0,0.0,1937.0,1937.0,820.0,1937.0,0.9992620000000244,-21.0,-21.0,0.0,,,,0.011010780938079,0.013098460400306485,0.06118807196617127,6.86898929416202e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.08733994,0.06833449,0.47135752,0.016372742,,,,
3,413.0,0.0,2768.0,2768.0,831.0,2768.0,0.9985141000000488,-21.0,-21.0,0.0,,,,0.01163802880151147,0.013571124716079436,0.08714678883552551,3.9931001083459705e-05,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.06724033,0.035371285,0.2241408,0.011829718999999999,0.10583201,0.011610512,0.12072124,0.08555735
4,667.0,0.0,3783.0,3783.0,1015.0,3783.0,0.9976006000000791,-20.0,-20.0,0.0,,,,0.01136319609350886,0.012043113812065086,0.049625951796770096,9.354137000627816e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.060902383,0.032815605,0.17838788,0.015925674,0.0978057,0.014090337,0.123560354,0.07580207
5,947.0,0.0,4906.0,4906.0,1123.0,4906.0,0.9965899000001124,-18.0,-18.0,0.0,,,,0.010341535720908724,0.011934284708938809,0.06498207896947861,6.708659930154681e-05,0.00010000000000000002,1.3552527156068802e-20,0.0001,0.0001,0.054970358,0.03215441,0.26232755,0.009252935,0.09154041,0.009532932,0.10656521,0.07300271
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Q/Mean Q/Stdev Q/Max Q/Min
2 1 0.0 1.0 1117.0 1117.0 1117.0 1117.0 1.0 0.0
3 2 210.0 205.0 0.0 1958.0 1937.0 1958.0 1937.0 841.0 820.0 1958.0 1937.0 0.9992431000000248 0.9992620000000244 -20.0 -21.0 -20.0 -21.0 0.0 0.011158549779723952 0.011010780938079 0.01233800718156463 0.013098460400306485 0.04892086610198021 0.06118807196617127 7.747895870124921e-05 6.86898929416202e-05 0.00010000000000000002 1.3552527156068802e-20 0.0001 0.0001 0.07765737 0.08733994 0.051450502 0.06833449 0.27204409999999996 0.47135752 0.016480377 0.016372742
4 3 402.0 413.0 0.0 2726.0 2768.0 2726.0 2768.0 768.0 831.0 2726.0 2768.0 0.9985519000000476 0.9985141000000488 -21.0 -21.0 0.0 0.011682878495226607 0.01163802880151147 0.013976986698806206 0.013571124716079436 0.07550939172506332 0.08714678883552551 3.554971408448182e-05 3.9931001083459705e-05 0.00010000000000000003 2.7105054312137605e-20 0.0001 0.0001 0.054967567 0.06724033 0.03760215 0.035371285 0.23677647 0.2241408 0.007137654300000001 0.011829718999999999 0.059924055 0.10583201 0.010001821999999999 0.011610512 0.070588365 0.12072124 0.045257278 0.08555735
5 4 601.0 667.0 0.0 3519.0 3783.0 3519.0 3783.0 793.0 1015.0 3519.0 3783.0 0.9978382000000712 0.9976006000000791 -21.0 -20.0 -21.0 -20.0 0.0 0.013331305195076387 0.01136319609350886 0.013162853602752194 0.012043113812065086 0.0471726730465889 0.049625951796770096 9.195879101753236e-05 9.354137000627816e-05 0.0001 0.00010000000000000002 0.0 1.3552527156068802e-20 0.0001 0.0001 0.05391158 0.060902383 0.02641614 0.032815605 0.14699543 0.17838788 0.017958568 0.015925674 0.038910400000000005 0.0978057 0.006119223000000001 0.014090337 0.046009037999999995 0.123560354 0.030036567000000004 0.07580207
6 5 837.0 947.0 0.0 4466.0 4906.0 4466.0 4906.0 947.0 1123.0 4466.0 4906.0 0.9969859000000992 0.9965899000001124 -20.0 -18.0 -20.0 -18.0 0.0 0.011204646104627085 0.010341535720908724 0.012869155071181351 0.011934284708938809 0.06053701043128967 0.06498207896947861 6.284505798248574e-05 6.708659930154681e-05 0.00010000000000000002 1.3552527156068802e-20 0.0001 0.0001 0.047131248 0.054970358 0.026914247999999998 0.03215441 0.13275696 0.26232755 0.010900318999999999 0.009252935 0.09154041 0.009532932 0.10656521 0.07300271