coach/rl_coach/traces/Atari_A3C_pong/trace.csv at 72a1d9d426004269997f8b40bdd64f8ee582d91e

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-21 17:25:53 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

2.0 KiB

Raw Blame History

1	Episode #	Training Iter	In Heatup	ER #Transitions	ER #Episodes	Episode Length	Total steps	Shaped Training Reward	Training Reward	Grads (unclipped)/Mean	Grads (unclipped)/Stdev	Grads (unclipped)/Max	Grads (unclipped)/Min	Entropy/Mean	Entropy/Stdev	Entropy/Max	Entropy/Min	Advantages/Mean	Advantages/Stdev	Advantages/Max	Advantages/Min	Values/Mean	Values/Stdev	Values/Max	Values/Min	Value Loss/Mean	Value Loss/Stdev	Value Loss/Max	Value Loss/Min	Policy Loss/Mean	Policy Loss/Stdev	Policy Loss/Max	Policy Loss/Min
2	1	0.0	1.0	881.0	1.0	881.0	881.0
3	2	0.0	1.0	1043.0	1.0	1043.0	1924.0
4	3	38.0	0.0	763.0	1.0	763.0	2687.0	-21.0	-21.0	1.4867063	1.4938432	5.887912	0.0015561687	1.7584827	0.03291289	1.788348	1.6738943	-0.15541847978173265	0.4162753016651965	0.5290131568908691	-1.0030009746551514	-1.0561148	0.93491656	0.021942224	-2.8995342	0.09872001400000001	0.11163085	0.38036227	1.426808e-06	-0.27536926	0.5678659	0.53289455	-1.4932774
5	4	75.0	0.0	740.0	1.0	740.0	3427.0	-21.0	-21.0	3.4039152	1.9638362	7.991121000000001	0.21933316	1.4528251	0.16249819	1.6673243999999998	1.1318555000000001	-0.06318947109911177	0.4280228160756264	0.6142082214355469	-0.9774222373962402	-2.6464324	0.35272834	-2.2407157000000004	-3.3780959999999998	0.09359822	0.06729852	0.24981685	0.03076201	-0.096310705	0.51300126	0.43143788	-1.0997788999999998
6	5	113.0	0.0	755.0	1.0	755.0	4182.0	-21.0	-21.0	3.233333	1.986448	7.6551165999999995	0.19970839	1.2935143999999998	0.11847049	1.439909	1.0601448	-0.08421046411668932	0.4278507532658671	0.5715954303741455	-0.980087161064148	-2.3434882000000004	0.31644145	-1.9696671000000001	-3.1619172000000004	0.09507382	0.07568375	0.26689446	0.024116684	-0.1209513	0.48729447	0.5198732	-1.1845143

2.0 KiB Raw Blame History

2.0 KiB

Raw Blame History