coach/rl_coach/traces/ControlSuite_DDPG_hopper_hop/trace.csv at a16d7249633a4069aaa9c218abbcf1e8226c5ab4

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-14 21:15:53 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

2.5 KiB

Raw Blame History

1	Episode #	Training Iter	In Heatup	ER #Transitions	ER #Episodes	Episode Length	Total steps	Epsilon	Shaped Training Reward	Training Reward	Update Target Network	Loss/Mean	Loss/Stdev	Loss/Max	Loss/Min	Learning Rate/Mean	Learning Rate/Stdev	Learning Rate/Max	Learning Rate/Min	Grads (unclipped)/Mean	Grads (unclipped)/Stdev	Grads (unclipped)/Max	Grads (unclipped)/Min	Q/Mean	Q/Stdev	Q/Max	Q/Min	TD targets/Mean	TD targets/Stdev	TD targets/Max	TD targets/Min	actions/Mean	actions/Stdev	actions/Max	actions/Min
2	1	0.0	1.0	1000.0	1.0	1000.0	1000.0	0.0			0.0
3	2	0.0	1.0	2000.0	2.0	1000.0	2000.0	0.0			1.0
4	3	999.0	0.0	3000.0	3.0	1000.0	3000.0	-0.017666830179174003	0.0	0.0	1.0	0.005126546850151572	0.004660130005352106	0.05132860690355301	0.0005627279169857502	0.00010000000000000003	4.0657581468206416e-20	0.0001	0.0001	0.1678458	0.12852536	1.5926425	0.024081124	0.24388833	0.11236252	0.42840713	-0.8727883	-0.01800971130044855	0.1942566799457346	0.5385999780893326	-0.9172834092378616	-0.11966089038501045	0.8962365587209448	1.3716363433793126	-1.5680451743766328
5	4	1999.0	0.0	4000.0	4.0	1000.0	4000.0	-0.039999362478752916	0.0	0.0	1.0	0.0008180646479820358	0.000529273102626917	0.0054473504424095145	0.00014673141413368285	0.00010000000000000003	4.0657581468206416e-20	0.0001	0.0001	0.0469651	0.025094092000000002	0.22221590000000002	0.010784525	0.14337498	0.17592207	0.33719423	-0.28446856	0.20208258858056294	0.13431578391837634	0.5768654608726501	-0.5833876812458039	-0.2705900161928217	0.9272508528236816	1.9572209345620784	-1.4727463554915825
6	5	2999.0	0.0	5000.0	5.0	1000.0	5000.0	0.17145601483403705	0.0	0.0	0.0	0.0003958249435308753	0.00031769597300822634	0.0040870513767004004	0.00010442566417623311	0.00010000000000000003	4.0657581468206416e-20	0.0001	0.0001	0.025218817999999997	0.013975793500000002	0.14064097	0.0070197446999999994	-0.04435015	0.030164617999999997	0.124313995	-0.2207815	0.26081367032274555	0.12723809202247516	0.5931554335355759	-0.2620022776722908	-0.2959362262223715	0.6939703144112135	1.0669463809202309	-1.416814717430604

2.5 KiB Raw Blame History

2.5 KiB

Raw Blame History