1
0
mirror of https://github.com/gryf/coach.git synced 2026-04-19 14:13:32 +02:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

View File

@@ -1,6 +1,6 @@
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min
1,0.0,1.0,772.0,1.0,772.0,772.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,821.0,1.0,821.0,1593.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,40.0,0.0,798.0,1.0,798.0,2391.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.6925433,1.56166,5.4782605,0.011157283,1.7178073,0.09590818,1.7916279,1.4786046,-1.1788440822140267,1.255206542380211,1.0642428398132324,-3.2777271270751958,-1.3811463,1.9222080000000001,-0.023936661,-5.8730206,0.34145027,0.439121,2.7880993,0.00036954717000000004,-0.13185489,0.9237273000000001,0.9732863000000002,-4.1131425
4,78.0,0.0,755.0,1.0,755.0,3146.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,2.4706511000000004,4.6726856,30.583601,0.67071736,1.5111907,0.024124695,1.5492108999999998,1.439797,0.24763718664174345,0.8147690823204996,4.918219566345215,-0.9567646980285645,-4.656093599999999,0.40140057,-2.9237971000000003,-5.8753314,0.443355,1.7042981,10.802688999999999,0.08576387,0.41162340000000003,1.3014139999999998,7.6323996,-0.8151264
5,116.0,0.0,756.0,1.0,756.0,3902.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.850646,2.4202513999999997,16.295288,0.4290005,1.3224964,0.050945385999999995,1.4303273,1.2031558,0.14155570039685988,0.5814409942921325,3.3915529251098637,-0.8861622810363771,-3.0170727000000004,0.6280401999999999,-2.1003609,-4.3324237,0.22347862,0.839418,5.326185,0.05343006,0.22746566,0.82214487,4.685924,-0.7187973000000001
3,47.0,0.0,960.0,1.0,960.0,2553.0,0.0,-20.0,-20.0,0.0,,,,,,,,,,,,1.1341655000000002,1.3580534,5.892931,0.0023950292,1.7183144000000001,0.11223783,1.7917048999999998,1.2778816000000002,0.04575298835242915,0.4587136251645712,1.7850174903869631,-1.000868558883667,-2.1548557,1.8245186999999998,0.0046853945000000004,-5.0325212,0.2002191,0.18291572,0.6043339,3.6701476e-06,0.053236503,0.61648566,1.385742,-1.489837
4,88.0,0.0,802.0,1.0,802.0,3355.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.3043066,0.5805045999999999,3.1598291,0.34928647,0.9577253000000001,0.112086765,1.7206139999999999,0.84311396,0.06997428238391876,0.3966030207984067,0.8193864822387695,-0.957106113433838,-3.1394837000000004,0.53956544,-2.564023,-4.4771279999999996,0.10647102,0.06113237,0.296035,0.04434104,0.07748371,0.32128,0.67603266,-0.5327814999999999
5,129.0,0.0,815.0,1.0,815.0,4170.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.392543,0.8525935,3.4830544000000003,0.1963895,0.90076345,0.08584659,1.5318372,0.77205926,-0.06867338344454765,0.4191816209286624,0.5097661018371582,-0.9805699586868286,-2.2598412,0.26356682,-1.9182776999999998,-2.800033,0.09539665,0.07594069999999999,0.25968197,0.031170906,-0.07717073,0.33031166,0.287632,-0.91986656
1 Episode # Training Iter In Heatup ER #Transitions ER #Episodes Episode Length Total steps Epsilon Shaped Training Reward Training Reward Update Target Network Evaluation Reward Shaped Evaluation Reward Success Rate Loss/Mean Loss/Stdev Loss/Max Loss/Min Learning Rate/Mean Learning Rate/Stdev Learning Rate/Max Learning Rate/Min Grads (unclipped)/Mean Grads (unclipped)/Stdev Grads (unclipped)/Max Grads (unclipped)/Min Entropy/Mean Entropy/Stdev Entropy/Max Entropy/Min Advantages/Mean Advantages/Stdev Advantages/Max Advantages/Min Values/Mean Values/Stdev Values/Max Values/Min Value Loss/Mean Value Loss/Stdev Value Loss/Max Value Loss/Min Policy Loss/Mean Policy Loss/Stdev Policy Loss/Max Policy Loss/Min
2 1 0.0 1.0 772.0 1.0 772.0 772.0 0.0 0.0
3 2 0.0 1.0 821.0 1.0 821.0 1593.0 0.0 0.0
4 3 40.0 47.0 0.0 798.0 960.0 1.0 798.0 960.0 2391.0 2553.0 0.0 -21.0 -20.0 -21.0 -20.0 0.0 1.6925433 1.1341655000000002 1.56166 1.3580534 5.4782605 5.892931 0.011157283 0.0023950292 1.7178073 1.7183144000000001 0.09590818 0.11223783 1.7916279 1.7917048999999998 1.4786046 1.2778816000000002 -1.1788440822140267 0.04575298835242915 1.255206542380211 0.4587136251645712 1.0642428398132324 1.7850174903869631 -3.2777271270751958 -1.000868558883667 -1.3811463 -2.1548557 1.9222080000000001 1.8245186999999998 -0.023936661 0.0046853945000000004 -5.8730206 -5.0325212 0.34145027 0.2002191 0.439121 0.18291572 2.7880993 0.6043339 0.00036954717000000004 3.6701476e-06 -0.13185489 0.053236503 0.9237273000000001 0.61648566 0.9732863000000002 1.385742 -4.1131425 -1.489837
5 4 78.0 88.0 0.0 755.0 802.0 1.0 755.0 802.0 3146.0 3355.0 0.0 -21.0 -21.0 0.0 2.4706511000000004 1.3043066 4.6726856 0.5805045999999999 30.583601 3.1598291 0.67071736 0.34928647 1.5111907 0.9577253000000001 0.024124695 0.112086765 1.5492108999999998 1.7206139999999999 1.439797 0.84311396 0.24763718664174345 0.06997428238391876 0.8147690823204996 0.3966030207984067 4.918219566345215 0.8193864822387695 -0.9567646980285645 -0.957106113433838 -4.656093599999999 -3.1394837000000004 0.40140057 0.53956544 -2.9237971000000003 -2.564023 -5.8753314 -4.4771279999999996 0.443355 0.10647102 1.7042981 0.06113237 10.802688999999999 0.296035 0.08576387 0.04434104 0.41162340000000003 0.07748371 1.3014139999999998 0.32128 7.6323996 0.67603266 -0.8151264 -0.5327814999999999
6 5 116.0 129.0 0.0 756.0 815.0 1.0 756.0 815.0 3902.0 4170.0 0.0 -21.0 -21.0 0.0 1.850646 1.392543 2.4202513999999997 0.8525935 16.295288 3.4830544000000003 0.4290005 0.1963895 1.3224964 0.90076345 0.050945385999999995 0.08584659 1.4303273 1.5318372 1.2031558 0.77205926 0.14155570039685988 -0.06867338344454765 0.5814409942921325 0.4191816209286624 3.3915529251098637 0.5097661018371582 -0.8861622810363771 -0.9805699586868286 -3.0170727000000004 -2.2598412 0.6280401999999999 0.26356682 -2.1003609 -1.9182776999999998 -4.3324237 -2.800033 0.22347862 0.09539665 0.839418 0.07594069999999999 5.326185 0.25968197 0.05343006 0.031170906 0.22746566 -0.07717073 0.82214487 0.33031166 4.685924 0.287632 -0.7187973000000001 -0.91986656