coach/rl_coach/traces/Atari_DQN_pong/trace.csv at 72a1d9d426004269997f8b40bdd64f8ee582d91e

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-23 10:35:46 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

1.7 KiB

Raw Blame History

1	Episode #	Training Iter	In Heatup	ER #Transitions	ER #Episodes	Episode Length	Total steps	Epsilon	Shaped Training Reward	Training Reward	Loss/Mean	Loss/Stdev	Loss/Max	Loss/Min	Learning Rate/Mean	Learning Rate/Stdev	Learning Rate/Max	Learning Rate/Min	Grads (unclipped)/Mean	Grads (unclipped)/Stdev	Grads (unclipped)/Max	Grads (unclipped)/Min	Q/Mean	Q/Stdev	Q/Max	Q/Min
2	1	0.0	1.0	1117.0	1117.0	1117.0	1117.0	1.0
3	2	205.0	0.0	1937.0	1937.0	820.0	1937.0	0.9992620000000244	-21.0	-21.0	0.011010780938079	0.013098460400306485	0.06118807196617127	6.86898929416202e-05	0.00010000000000000002	1.3552527156068802e-20	0.0001	0.0001	0.08733994	0.06833449	0.47135752	0.016372742
4	3	413.0	0.0	2768.0	2768.0	831.0	2768.0	0.9985141000000488	-21.0	-21.0	0.01163802880151147	0.013571124716079436	0.08714678883552551	3.9931001083459705e-05	0.00010000000000000003	2.7105054312137605e-20	0.0001	0.0001	0.06724033	0.035371285	0.2241408	0.011829718999999999	0.10583201	0.011610512	0.12072124	0.08555735
5	4	667.0	0.0	3783.0	3783.0	1015.0	3783.0	0.9976006000000791	-20.0	-20.0	0.01136319609350886	0.012043113812065086	0.049625951796770096	9.354137000627816e-05	0.00010000000000000002	1.3552527156068802e-20	0.0001	0.0001	0.060902383	0.032815605	0.17838788	0.015925674	0.0978057	0.014090337	0.123560354	0.07580207
6	5	947.0	0.0	4906.0	4906.0	1123.0	4906.0	0.9965899000001124	-18.0	-18.0	0.010341535720908724	0.011934284708938809	0.06498207896947861	6.708659930154681e-05	0.00010000000000000002	1.3552527156068802e-20	0.0001	0.0001	0.054970358	0.03215441	0.26232755	0.009252935	0.09154041	0.009532932	0.10656521	0.07300271

1.7 KiB Raw Blame History

1.7 KiB

Raw Blame History