mirror of
https://github.com/gryf/coach.git
synced 2026-03-19 00:13:46 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.7 KiB
1.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1117.0 | 1117.0 | 1117.0 | 1.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 205.0 | 0.0 | 1937.0 | 1937.0 | 820.0 | 1937.0 | 0.9991882000000176 | -21.0 | -21.0 | 0.0 | 0.014763841533212831 | 0.013646937264058223 | 0.06725655496120453 | 2.0758947357535362e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.17952479999999998 | 0.13626544 | 0.9860897 | 0.0053134440000000005 | |||||||
| 4 | 3 | 413.0 | 0.0 | 2768.0 | 2768.0 | 831.0 | 2768.0 | 0.9983655100000356 | -21.0 | -21.0 | 0.0 | 0.012111850191891229 | 0.013912744765592264 | 0.08914861083030699 | 1.7985148588195443e-05 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.057201855 | 0.04205291 | 0.26596984 | 0.0031672046 | -0.04456665 | 0.009031756 | -0.031443898 | -0.059377108 | |||
| 5 | 4 | 667.0 | 0.0 | 3783.0 | 3783.0 | 1015.0 | 3783.0 | 0.9973606600000572 | -20.0 | -20.0 | 0.0 | 0.013269104183936587 | 0.013449185914245043 | 0.07771021127700806 | 1.3188657248974778e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.098453455 | 0.109315164 | 0.9814589 | 0.0024465397 | -0.008853295 | 0.009689603 | 0.0003574537 | -0.028319128 | |||
| 6 | 5 | 867.0 | 0.0 | 4585.0 | 4585.0 | 802.0 | 4585.0 | 0.9965666800000744 | -21.0 | -21.0 | 0.0 | 0.01383970570535894 | 0.013677503957050816 | 0.0817062109708786 | 5.106279422761872e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.108334474 | 0.0749226 | 0.40531653 | 0.006287096 | -0.018026425 | 0.047121227 | 0.035217006 | -0.070681214 |