1
0
mirror of https://github.com/gryf/coach.git synced 2026-03-11 03:55:52 +01:00
Files
coach/rl_coach/traces/Atari_QR_DQN_pong/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

1.8 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinQ/MeanQ/StdevQ/MaxQ/Min
210.01.01117.01117.01117.01117.01.00.0
32205.00.01937.01937.0820.01937.00.9991882000000176-21.0-21.00.036.6046499775677242.04124769391064201.156112670898442.7880206108093265.000000000000001e-056.776263578034403e-215e-055e-0514.73432999999999811.57865283.246569999999993.6869566
43413.00.02768.02768.0831.02768.00.9983655100000356-21.0-21.00.037.44882530432481440.97555825854826265.187011718752.74288630485534675.0000000000000016e-051.3552527156068802e-205e-055e-0546.14658737.73792313.1151412.797323-0.022282716333963130.010482918460358506-0.008034438502509147-0.03863051085398183
54667.00.03783.03783.01015.03783.00.9973606600000572-20.0-20.00.035.22298315941818533.638557732845605134.392959594726563.31116747856140145.000000000000001e-056.776263578034403e-215e-055e-0554.70079399999999528.679327185.9460600000000225.897139000000003-0.052764346493107350.013212184652596557-0.03154730399168329-0.06887179555138573
65867.00.04585.04585.0802.04585.00.9965666800000744-21.0-21.00.033.3641553866863333.794293936783085170.811828613281253.28400564193725635.000000000000001e-056.776263578034403e-215e-055e-0553.99600200000000431.833138239.3674527.415855-0.038782771347359820.010679782367249705-0.01826882790250238-0.05715514831594193