coach/rl_coach/traces/Atari_A3C_LSTM_pong/trace.csv at 72a1d9d426004269997f8b40bdd64f8ee582d91e

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-03-01 22:25:46 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

2.0 KiB

Raw Blame History

1	Episode #	Training Iter	In Heatup	ER #Transitions	ER #Episodes	Episode Length	Total steps	Shaped Training Reward	Training Reward	Grads (unclipped)/Mean	Grads (unclipped)/Stdev	Grads (unclipped)/Max	Grads (unclipped)/Min	Entropy/Mean	Entropy/Stdev	Entropy/Max	Entropy/Min	Advantages/Mean	Advantages/Stdev	Advantages/Max	Advantages/Min	Values/Mean	Values/Stdev	Values/Max	Values/Min	Value Loss/Mean	Value Loss/Stdev	Value Loss/Max	Value Loss/Min	Policy Loss/Mean	Policy Loss/Stdev	Policy Loss/Max	Policy Loss/Min
2	1	0.0	1.0	772.0	1.0	772.0	772.0
3	2	0.0	1.0	821.0	1.0	821.0	1593.0
4	3	47.0	0.0	960.0	1.0	960.0	2553.0	-20.0	-20.0	1.1341655000000002	1.3580534	5.892931	0.0023950292	1.7183144000000001	0.11223783	1.7917048999999998	1.2778816000000002	0.04575298835242915	0.4587136251645712	1.7850174903869631	-1.000868558883667	-2.1548557	1.8245186999999998	0.0046853945000000004	-5.0325212	0.2002191	0.18291572	0.6043339	3.6701476e-06	0.053236503	0.61648566	1.385742	-1.489837
5	4	88.0	0.0	802.0	1.0	802.0	3355.0	-21.0	-21.0	1.3043066	0.5805045999999999	3.1598291	0.34928647	0.9577253000000001	0.112086765	1.7206139999999999	0.84311396	0.06997428238391876	0.3966030207984067	0.8193864822387695	-0.957106113433838	-3.1394837000000004	0.53956544	-2.564023	-4.4771279999999996	0.10647102	0.06113237	0.296035	0.04434104	0.07748371	0.32128	0.67603266	-0.5327814999999999
6	5	129.0	0.0	815.0	1.0	815.0	4170.0	-21.0	-21.0	1.392543	0.8525935	3.4830544000000003	0.1963895	0.90076345	0.08584659	1.5318372	0.77205926	-0.06867338344454765	0.4191816209286624	0.5097661018371582	-0.9805699586868286	-2.2598412	0.26356682	-1.9182776999999998	-2.800033	0.09539665	0.07594069999999999	0.25968197	0.031170906	-0.07717073	0.33031166	0.287632	-0.91986656

2.0 KiB Raw Blame History

2.0 KiB

Raw Blame History