coach/rl_coach/traces/ControlSuite_DDPG_cartpole_swingup/trace.csv at 72a1d9d426004269997f8b40bdd64f8ee582d91e

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-01-28 11:05:46 +01:00

Files

Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105 )

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file

2018-09-04 15:07:54 +03:00

2.6 KiB

Raw Blame History

1	Episode #	Training Iter	In Heatup	ER #Transitions	ER #Episodes	Episode Length	Total steps	Epsilon	Shaped Training Reward	Training Reward	Update Target Network	Loss/Mean	Loss/Stdev	Loss/Max	Loss/Min	Learning Rate/Mean	Learning Rate/Stdev	Learning Rate/Max	Learning Rate/Min	Grads (unclipped)/Mean	Grads (unclipped)/Stdev	Grads (unclipped)/Max	Grads (unclipped)/Min	Q/Mean	Q/Stdev	Q/Max	Q/Min	TD targets/Mean	TD targets/Stdev	TD targets/Max	TD targets/Min	actions/Mean	actions/Stdev	actions/Max	actions/Min
2	1	0.0	1.0	1001.0	1.0	1001.0	1001.0	0.0			0.0
3	2	0.0	1.0	2002.0	2.0	1001.0	2002.0	0.0			1.0
4	3	1000.0	0.0	3003.0	3.0	1001.0	3003.0	-0.1185302492771778	12.73546386992824	127.3546386992823	1.0	1.4965620654038504e-05	3.650858260133972e-05	0.0007415295694954692	1.510996071374393e-06	0.00010000000000000003	2.7105054312137605e-20	0.0001	0.0001	0.0049991608	0.004320714	0.029555712	0.00052324863	-0.02509604	0.12253879	0.19679643	-0.25691667	0.002979598431912616	0.042334642053058036	0.09477341320020807	-0.1277348208010601	0.7574106673312205	0.2820158065549947	1.3628734977284602	-0.13561528749852786
5	4	2001.0	0.0	4004.0	4.0	1001.0	4004.0	-0.2048510260598676	7.629510433822026	76.29510433822016	1.0	9.294460378555413e-05	0.00018001446184314637	0.0014042556285858154	1.88643639376096e-06	0.00010000000000000003	2.7105054312137605e-20	0.0001	0.0001	0.018415965	0.019779565	0.19278607	0.0008359549000000001	-0.007871467	0.112213835	0.17972693	-0.23277566	0.002690244749371092	0.08247475995739656	0.20102381350942625	-0.2633941138081878	0.8866818175665385	0.1980599181751808	1.3750565774684147	0.4541525586846937
6	5	3002.0	0.0	5005.0	5.0	1001.0	5005.0	-0.02134772535498328	7.612595851248884	76.12595851248874	0.0	4.2167748756014586e-05	0.00010527586086637082	0.001020723837427795	1.4967686183808837e-06	0.00010000000000000003	2.7105054312137605e-20	0.0001	0.0001	0.009330036	0.0073557219999999994	0.04758695	0.00048721785000000004	-0.00076002936	0.12163509	0.17490079	-0.2400235	0.009237181711633144	0.09619469158143916	0.21206437128683006	-0.2783133662129137	1.0669116245649553	0.12577960072670955	1.400521073123623	0.7853534082159903

2.6 KiB Raw Blame History

2.6 KiB

Raw Blame History