mirror of
https://github.com/gryf/coach.git
synced 2026-02-21 17:25:53 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2.0 KiB
2.0 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 881.0 | 1.0 | 881.0 | 881.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 1043.0 | 1.0 | 1043.0 | 1924.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 4 | 3 | 38.0 | 0.0 | 763.0 | 1.0 | 763.0 | 2687.0 | 0.0 | -21.0 | -21.0 | 0.0 | 1.4867063 | 1.4938432 | 5.887912 | 0.0015561687 | 1.7584827 | 0.03291289 | 1.788348 | 1.6738943 | -0.15541847978173265 | 0.4162753016651965 | 0.5290131568908691 | -1.0030009746551514 | -1.0561148 | 0.93491656 | 0.021942224 | -2.8995342 | 0.09872001400000001 | 0.11163085 | 0.38036227 | 1.426808e-06 | -0.27536926 | 0.5678659 | 0.53289455 | -1.4932774 | |||||||||||
| 5 | 4 | 75.0 | 0.0 | 740.0 | 1.0 | 740.0 | 3427.0 | 0.0 | -21.0 | -21.0 | 0.0 | 3.4039152 | 1.9638362 | 7.991121000000001 | 0.21933316 | 1.4528251 | 0.16249819 | 1.6673243999999998 | 1.1318555000000001 | -0.06318947109911177 | 0.4280228160756264 | 0.6142082214355469 | -0.9774222373962402 | -2.6464324 | 0.35272834 | -2.2407157000000004 | -3.3780959999999998 | 0.09359822 | 0.06729852 | 0.24981685 | 0.03076201 | -0.096310705 | 0.51300126 | 0.43143788 | -1.0997788999999998 | |||||||||||
| 6 | 5 | 113.0 | 0.0 | 755.0 | 1.0 | 755.0 | 4182.0 | 0.0 | -21.0 | -21.0 | 0.0 | 3.233333 | 1.986448 | 7.6551165999999995 | 0.19970839 | 1.2935143999999998 | 0.11847049 | 1.439909 | 1.0601448 | -0.08421046411668932 | 0.4278507532658671 | 0.5715954303741455 | -0.980087161064148 | -2.3434882000000004 | 0.31644145 | -1.9696671000000001 | -3.1619172000000004 | 0.09507382 | 0.07568375 | 0.26689446 | 0.024116684 | -0.1209513 | 0.48729447 | 0.5198732 | -1.1845143 |