mirror of
https://github.com/gryf/coach.git
synced 2026-02-22 01:45:56 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.4 KiB
1.4 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 986.0 | 986.0 | 986.0 | 986.0 | 7.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 1806.0 | 1806.0 | 820.0 | 1806.0 | 4.0 | 0.0 | |||||||||||||||||||||
| 4 | 3 | 206.0 | 0.0 | 2629.0 | 2629.0 | 823.0 | 2629.0 | 5.0 | -21.0 | -21.0 | 0.0 | 0.01375627432677452 | 0.013505330839893808 | 0.06677445024251938 | 0.0005553220980800688 | 0.0002500000000000001 | 1.0842021724855042e-19 | 0.00025 | 0.00025 | 0.013602738 | 0.0048916726 | 0.034245104 | 0.0056978124 | |||||||
| 5 | 4 | 398.0 | 0.0 | 3397.0 | 3397.0 | 768.0 | 3397.0 | 3.0 | -21.0 | -21.0 | 0.0 | 0.014156610367839068 | 0.013173363350960334 | 0.059119727462530136 | 0.0007080046343617141 | 0.0002500000000000001 | 5.421010862427521e-20 | 0.00025 | 0.00025 | 0.012839798999999999 | 0.0038416919 | 0.024480136 | 0.005681609000000001 | |||||||
| 6 | 5 | 617.0 | 0.0 | 4274.0 | 4274.0 | 877.0 | 4274.0 | 6.0 | -21.0 | -21.0 | 0.0 | 0.015369139484674181 | 0.01463229484329247 | 0.08113615959882736 | 0.0005487628513947129 | 0.0002500000000000001 | 1.0842021724855042e-19 | 0.00025 | 0.00025 | 0.014249632 | 0.005901839599999999 | 0.04092761 | 0.004881437 | 0.004008428 | 0.016476048 | 0.028364737 | -0.026583625 |