1
0
mirror of https://github.com/gryf/coach.git synced 2026-02-23 10:35:46 +01:00
Files
coach/rl_coach/traces/Atari_DQN_pong/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

1.7 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinQ/MeanQ/StdevQ/MaxQ/Min
210.01.01117.01117.01117.01117.01.00.0
32205.00.01937.01937.0820.01937.00.9992620000000244-21.0-21.00.00.0110107809380790.0130984604003064850.061188071966171276.86898929416202e-050.000100000000000000021.3552527156068802e-200.00010.00010.087339940.068334490.471357520.016372742
43413.00.02768.02768.0831.02768.00.9985141000000488-21.0-21.00.00.011638028801511470.0135711247160794360.087146788835525513.9931001083459705e-050.000100000000000000032.7105054312137605e-200.00010.00010.067240330.0353712850.22414080.0118297189999999990.105832010.0116105120.120721240.08555735
54667.00.03783.03783.01015.03783.00.9976006000000791-20.0-20.00.00.011363196093508860.0120431138120650860.0496259517967700969.354137000627816e-050.000100000000000000021.3552527156068802e-200.00010.00010.0609023830.0328156050.178387880.0159256740.09780570.0140903370.1235603540.07580207
65947.00.04906.04906.01123.04906.00.9965899000001124-18.0-18.00.00.0103415357209087240.0119342847089388090.064982078969478616.708659930154681e-050.000100000000000000021.3552527156068802e-200.00010.00010.0549703580.032154410.262327550.0092529350.091540410.0095329320.106565210.07300271