1
0
mirror of https://github.com/gryf/coach.git synced 2026-01-28 11:05:46 +01:00
Files
coach/rl_coach/traces/ControlSuite_DDPG_cartpole_swingup/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

2.6 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinEntropy/MeanEntropy/StdevEntropy/MaxEntropy/MinAdvantages/MeanAdvantages/StdevAdvantages/MaxAdvantages/MinValues/MeanValues/StdevValues/MaxValues/MinValue Loss/MeanValue Loss/StdevValue Loss/MaxValue Loss/MinPolicy Loss/MeanPolicy Loss/StdevPolicy Loss/MaxPolicy Loss/MinQ/MeanQ/StdevQ/MaxQ/MinTD targets/MeanTD targets/StdevTD targets/MaxTD targets/Minactions/Meanactions/Stdevactions/Maxactions/Min
210.01.01001.01.01001.01001.00.00.0
320.01.02002.02.01001.02002.00.01.0
431000.00.03003.03.01001.03003.0-0.118530249277177812.73546386992824127.35463869928231.01.4965620654038504e-053.650858260133972e-050.00074152956949546921.510996071374393e-060.000100000000000000032.7105054312137605e-200.00010.00010.00499916080.0043207140.0295557120.00052324863-0.025096040.122538790.19679643-0.256916670.0029795984319126160.0423346420530580360.09477341320020807-0.12773482080106010.75741066733122050.28201580655499471.3628734977284602-0.13561528749852786
542001.00.04004.04.01001.04004.0-0.20485102605986767.62951043382202676.295104338220161.09.294460378555413e-050.000180014461843146370.00140425562858581541.88643639376096e-060.000100000000000000032.7105054312137605e-200.00010.00010.0184159650.0197795650.192786070.0008359549000000001-0.0078714670.1122138350.17972693-0.232775660.0026902447493710920.082474759957396560.20102381350942625-0.26339411380818780.88668181756653850.19805991817518081.37505657746841470.4541525586846937
653002.00.05005.05.01001.05005.0-0.021347725354983287.61259585124888476.125958512488740.04.2167748756014586e-050.000105275860866370820.0010207238374277951.4967686183808837e-060.000100000000000000032.7105054312137605e-200.00010.00010.0093300360.00735572199999999940.047586950.00048721785000000004-0.000760029360.121635090.17490079-0.24002350.0092371817116331440.096194691581439160.21206437128683006-0.27831336621291371.06691162456495530.125779600726709551.4005210731236230.7853534082159903