1
0
mirror of https://github.com/gryf/coach.git synced 2026-03-01 22:25:46 +01:00
Files
coach/rl_coach/traces/Atari_A3C_LSTM_pong/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

2.0 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinEntropy/MeanEntropy/StdevEntropy/MaxEntropy/MinAdvantages/MeanAdvantages/StdevAdvantages/MaxAdvantages/MinValues/MeanValues/StdevValues/MaxValues/MinValue Loss/MeanValue Loss/StdevValue Loss/MaxValue Loss/MinPolicy Loss/MeanPolicy Loss/StdevPolicy Loss/MaxPolicy Loss/Min
210.01.0772.01.0772.0772.00.00.0
320.01.0821.01.0821.01593.00.00.0
4347.00.0960.01.0960.02553.00.0-20.0-20.00.01.13416550000000021.35805345.8929310.00239502921.71831440000000010.112237831.79170489999999981.27788160000000020.045752988352429150.45871362516457121.7850174903869631-1.000868558883667-2.15485571.82451869999999980.0046853945000000004-5.03252120.20021910.182915720.60433393.6701476e-060.0532365030.616485661.385742-1.489837
5488.00.0802.01.0802.03355.00.0-21.0-21.00.01.30430660.58050459999999993.15982910.349286470.95772530000000010.1120867651.72061399999999990.843113960.069974282383918760.39660302079840670.8193864822387695-0.957106113433838-3.13948370000000040.53956544-2.564023-4.47712799999999960.106471020.061132370.2960350.044341040.077483710.321280.67603266-0.5327814999999999
65129.00.0815.01.0815.04170.00.0-21.0-21.00.01.3925430.85259353.48305440000000030.19638950.900763450.085846591.53183720.77205926-0.068673383444547650.41918162092866240.5097661018371582-0.9805699586868286-2.25984120.26356682-1.9182776999999998-2.8000330.095396650.075940699999999990.259681970.031170906-0.077170730.330311660.287632-0.91986656