1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-29 18:02:29 +01:00
Files
coach/rl_coach/traces/Pendulum_HAC/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

1.7 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinEntropy/MeanEntropy/StdevEntropy/MaxEntropy/MinAdvantages/MeanAdvantages/StdevAdvantages/MaxAdvantages/MinValues/MeanValues/StdevValues/MaxValues/MinValue Loss/MeanValue Loss/StdevValue Loss/MaxValue Loss/MinPolicy Loss/MeanPolicy Loss/StdevPolicy Loss/MaxPolicy Loss/MinQ/MeanQ/StdevQ/MaxQ/MinTD targets/MeanTD targets/StdevTD targets/MaxTD targets/Minactions/Meanactions/Stdevactions/Maxactions/Min
210.01.097.01.025.025.00.00.0
320.01.0194.02.025.050.00.00.0
430.00.0291.03.025.075.0-0.013705192291281485-1000.0-1000.00.0-0.510264340.22476047-0.15544460000000002-0.92959120000000012.08123595147431663.328479018730167412.234674698678914-0.08146359109321984
540.00.0388.04.025.0100.0-0.02430443169727376-1000.0-1000.00.0-0.425511660.15804265-0.14439134-0.716005441.72338226618525512.69184708556374910.017017240560527-0.08547367510074966
650.00.0485.05.025.0125.00.0-1000.0-1000.00.0-0.43195620.17422763-0.1460396-0.73375669999999991.7427980579823552.72583675812546910.305663257960603-0.09830476343631744