1
0
mirror of https://github.com/gryf/coach.git synced 2026-02-21 17:25:53 +01:00
Files
coach/rl_coach/traces/Atari_A3C_pong/trace.csv
Itai Caspi 72a1d9d426 Itaicaspi/episode reset refactoring (#105)
* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
2018-09-04 15:07:54 +03:00

2.0 KiB

1Episode #Training IterIn HeatupER #TransitionsER #EpisodesEpisode LengthTotal stepsEpsilonShaped Training RewardTraining RewardUpdate Target NetworkEvaluation RewardShaped Evaluation RewardSuccess RateLoss/MeanLoss/StdevLoss/MaxLoss/MinLearning Rate/MeanLearning Rate/StdevLearning Rate/MaxLearning Rate/MinGrads (unclipped)/MeanGrads (unclipped)/StdevGrads (unclipped)/MaxGrads (unclipped)/MinEntropy/MeanEntropy/StdevEntropy/MaxEntropy/MinAdvantages/MeanAdvantages/StdevAdvantages/MaxAdvantages/MinValues/MeanValues/StdevValues/MaxValues/MinValue Loss/MeanValue Loss/StdevValue Loss/MaxValue Loss/MinPolicy Loss/MeanPolicy Loss/StdevPolicy Loss/MaxPolicy Loss/Min
210.01.0881.01.0881.0881.00.00.0
320.01.01043.01.01043.01924.00.00.0
4338.00.0763.01.0763.02687.00.0-21.0-21.00.01.48670631.49384325.8879120.00155616871.75848270.032912891.7883481.6738943-0.155418479781732650.41627530166519650.5290131568908691-1.0030009746551514-1.05611480.934916560.021942224-2.89953420.098720014000000010.111630850.380362271.426808e-06-0.275369260.56786590.53289455-1.4932774
5475.00.0740.01.0740.03427.00.0-21.0-21.00.03.40391521.96383627.9911210000000010.219333161.45282510.162498191.66732439999999981.1318555000000001-0.063189471099111770.42802281607562640.6142082214355469-0.9774222373962402-2.64643240.35272834-2.2407157000000004-3.37809599999999980.093598220.067298520.249816850.03076201-0.0963107050.513001260.43143788-1.0997788999999998
65113.00.0755.01.0755.04182.00.0-21.0-21.00.03.2333331.9864487.65511659999999950.199708391.29351439999999980.118470491.4399091.0601448-0.084210464116689320.42785075326586710.5715954303741455-0.980087161064148-2.34348820000000040.31644145-1.9696671000000001-3.16191720000000040.095073820.075683750.266894460.024116684-0.12095130.487294470.5198732-1.1845143