1
0
mirror of https://github.com/gryf/coach.git synced 2026-02-14 21:15:53 +01:00
Files
coach/rl_coach/traces/ControlSuite_DDPG_hopper_hop/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

2.5 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinEntropy/MeanEntropy/StdevEntropy/MaxEntropy/MinAdvantages/MeanAdvantages/StdevAdvantages/MaxAdvantages/MinValues/MeanValues/StdevValues/MaxValues/MinValue Loss/MeanValue Loss/StdevValue Loss/MaxValue Loss/MinPolicy Loss/MeanPolicy Loss/StdevPolicy Loss/MaxPolicy Loss/MinQ/MeanQ/StdevQ/MaxQ/MinTD targets/MeanTD targets/StdevTD targets/MaxTD targets/Minactions/Meanactions/Stdevactions/Maxactions/Min
210.01.01000.01.01000.01000.00.00.0
320.01.02000.02.01000.02000.00.01.0
43999.00.03000.03.01000.03000.0-0.0176668301791740030.00.01.00.0051265468501515720.0046601300053521060.051328606903553010.00056272791698575020.000100000000000000034.0657581468206416e-200.00010.00010.16784580.128525361.59264250.0240811240.243888330.112362520.42840713-0.8727883-0.018009711300448550.19425667994573460.5385999780893326-0.9172834092378616-0.119660890385010450.89623655872094481.3716363433793126-1.5680451743766328
541999.00.04000.04.01000.04000.0-0.0399993624787529160.00.01.00.00081806464798203580.0005292731026269170.00544735044240951450.000146731414133682850.000100000000000000034.0657581468206416e-200.00010.00010.04696510.0250940920000000020.222215900000000020.0107845250.143374980.175922070.33719423-0.284468560.202082588580562940.134315783918376340.5768654608726501-0.5833876812458039-0.27059001619282170.92725085282368161.9572209345620784-1.4727463554915825
652999.00.05000.05.01000.05000.00.171456014834037050.00.00.00.00039582494353087530.000317695973008226340.00408705137670040040.000104425664176233110.000100000000000000034.0657581468206416e-200.00010.00010.0252188179999999970.0139757935000000020.140640970.0070197446999999994-0.044350150.0301646179999999970.124313995-0.22078150.260813670322745550.127238092022475160.5931554335355759-0.2620022776722908-0.29593622622237150.69397031441121351.0669463809202309-1.416814717430604