mirror of
https://github.com/gryf/coach.git
synced 2026-02-23 10:35:46 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.7 KiB
1.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1117.0 | 1117.0 | 1117.0 | 1.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 205.0 | 0.0 | 1937.0 | 1937.0 | 820.0 | 1937.0 | 0.9992620000000244 | -21.0 | -21.0 | 0.0 | 0.011010780938079 | 0.013098460400306485 | 0.06118807196617127 | 6.86898929416202e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.08733994 | 0.06833449 | 0.47135752 | 0.016372742 | |||||||
| 4 | 3 | 413.0 | 0.0 | 2768.0 | 2768.0 | 831.0 | 2768.0 | 0.9985141000000488 | -21.0 | -21.0 | 0.0 | 0.01163802880151147 | 0.013571124716079436 | 0.08714678883552551 | 3.9931001083459705e-05 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.06724033 | 0.035371285 | 0.2241408 | 0.011829718999999999 | 0.10583201 | 0.011610512 | 0.12072124 | 0.08555735 | |||
| 5 | 4 | 667.0 | 0.0 | 3783.0 | 3783.0 | 1015.0 | 3783.0 | 0.9976006000000791 | -20.0 | -20.0 | 0.0 | 0.01136319609350886 | 0.012043113812065086 | 0.049625951796770096 | 9.354137000627816e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.060902383 | 0.032815605 | 0.17838788 | 0.015925674 | 0.0978057 | 0.014090337 | 0.123560354 | 0.07580207 | |||
| 6 | 5 | 947.0 | 0.0 | 4906.0 | 4906.0 | 1123.0 | 4906.0 | 0.9965899000001124 | -18.0 | -18.0 | 0.0 | 0.010341535720908724 | 0.011934284708938809 | 0.06498207896947861 | 6.708659930154681e-05 | 0.00010000000000000002 | 1.3552527156068802e-20 | 0.0001 | 0.0001 | 0.054970358 | 0.03215441 | 0.26232755 | 0.009252935 | 0.09154041 | 0.009532932 | 0.10656521 | 0.07300271 |