1
0
mirror of https://github.com/gryf/coach.git synced 2026-02-26 20:25:53 +01:00
Files
coach/rl_coach/traces/Atari_A3C_pong/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

7 lines
2.0 KiB
CSV

Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min
1,0.0,1.0,881.0,1.0,881.0,881.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.0,1.0,1043.0,1.0,1043.0,1924.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,38.0,0.0,763.0,1.0,763.0,2687.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.4867063,1.4938432,5.887912,0.0015561687,1.7584827,0.03291289,1.788348,1.6738943,-0.15541847978173265,0.4162753016651965,0.5290131568908691,-1.0030009746551514,-1.0561148,0.93491656,0.021942224,-2.8995342,0.09872001400000001,0.11163085,0.38036227,1.426808e-06,-0.27536926,0.5678659,0.53289455,-1.4932774
4,75.0,0.0,740.0,1.0,740.0,3427.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,3.4039152,1.9638362,7.991121000000001,0.21933316,1.4528251,0.16249819,1.6673243999999998,1.1318555000000001,-0.06318947109911177,0.4280228160756264,0.6142082214355469,-0.9774222373962402,-2.6464324,0.35272834,-2.2407157000000004,-3.3780959999999998,0.09359822,0.06729852,0.24981685,0.03076201,-0.096310705,0.51300126,0.43143788,-1.0997788999999998
5,113.0,0.0,755.0,1.0,755.0,4182.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,3.233333,1.986448,7.6551165999999995,0.19970839,1.2935143999999998,0.11847049,1.439909,1.0601448,-0.08421046411668932,0.4278507532658671,0.5715954303741455,-0.980087161064148,-2.3434882000000004,0.31644145,-1.9696671000000001,-3.1619172000000004,0.09507382,0.07568375,0.26689446,0.024116684,-0.1209513,0.48729447,0.5198732,-1.1845143