mirror of
https://github.com/gryf/coach.git
synced 2026-03-11 03:55:52 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.8 KiB
1.8 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1117.0 | 1117.0 | 1117.0 | 1.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 205.0 | 0.0 | 1937.0 | 1937.0 | 820.0 | 1937.0 | 0.9991882000000176 | -21.0 | -21.0 | 0.0 | 36.60464997756772 | 42.04124769391064 | 201.15611267089844 | 2.788020610809326 | 5.000000000000001e-05 | 6.776263578034403e-21 | 5e-05 | 5e-05 | 14.734329999999998 | 11.578652 | 83.24656999999999 | 3.6869566 | |||||||
| 4 | 3 | 413.0 | 0.0 | 2768.0 | 2768.0 | 831.0 | 2768.0 | 0.9983655100000356 | -21.0 | -21.0 | 0.0 | 37.448825304324814 | 40.97555825854826 | 265.18701171875 | 2.7428863048553467 | 5.0000000000000016e-05 | 1.3552527156068802e-20 | 5e-05 | 5e-05 | 46.146587 | 37.73792 | 313.11514 | 12.797323 | -0.02228271633396313 | 0.010482918460358506 | -0.008034438502509147 | -0.03863051085398183 | |||
| 5 | 4 | 667.0 | 0.0 | 3783.0 | 3783.0 | 1015.0 | 3783.0 | 0.9973606600000572 | -20.0 | -20.0 | 0.0 | 35.222983159418185 | 33.638557732845605 | 134.39295959472656 | 3.3111674785614014 | 5.000000000000001e-05 | 6.776263578034403e-21 | 5e-05 | 5e-05 | 54.700793999999995 | 28.679327 | 185.94606000000002 | 25.897139000000003 | -0.05276434649310735 | 0.013212184652596557 | -0.03154730399168329 | -0.06887179555138573 | |||
| 6 | 5 | 867.0 | 0.0 | 4585.0 | 4585.0 | 802.0 | 4585.0 | 0.9965666800000744 | -21.0 | -21.0 | 0.0 | 33.36415538668633 | 33.794293936783085 | 170.81182861328125 | 3.2840056419372563 | 5.000000000000001e-05 | 6.776263578034403e-21 | 5e-05 | 5e-05 | 53.996002000000004 | 31.833138 | 239.36745 | 27.415855 | -0.03878277134735982 | 0.010679782367249705 | -0.01826882790250238 | -0.05715514831594193 |