mirror of
https://github.com/gryf/coach.git
synced 2026-01-30 04:05:51 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.5 KiB
1.5 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | Q Values/Mean | Q Values/Stdev | Q Values/Max | Q Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1.0 | 1117.0 | 1117.0 | 0.5 | 0.0 | |||||||||||||||||||||||||||||||||
| 3 | 2 | 166.0 | 0.0 | 834.0 | 1.0 | 834.0 | 1951.0 | 0.4918267999999965 | -20.0 | -20.0 | 0.0 | -0.049309142 | 0.05955426 | 0.11067552 | -0.31385273 | 0.10965226 | 0.25779134 | 0.96019816 | 1.650419e-05 | |||||||||||||||||||||||
| 4 | 3 | 343.0 | 0.0 | 883.0 | 1.0 | 883.0 | 2834.0 | 0.4831733999999927 | -20.0 | -20.0 | 0.0 | 0.00039612237 | 0.022137828 | 0.047817677 | -0.057933766 | 0.05449706 | 0.15011412 | 0.8670572 | 0.0013089271000000001 | |||||||||||||||||||||||
| 5 | 4 | 495.0 | 0.0 | 759.0 | 1.0 | 759.0 | 3593.0 | 0.4757351999999895 | -21.0 | -21.0 | 0.0 | -0.013107545 | 0.014792551000000001 | 0.02346693 | -0.051909205 | 0.09606385 | 0.22936918 | 0.84131515 | 0.00357637 | |||||||||||||||||||||||
| 6 | 5 | 646.0 | 0.0 | 755.0 | 1.0 | 755.0 | 4348.0 | 0.4683361999999863 | -21.0 | -21.0 | 0.0 | -0.056291025 | 0.024121637999999997 | 0.011681341000000001 | -0.11741245 | 0.111964785 | 0.24955077 | 0.79120165 | 0.0038064622999999997 |