mirror of
https://github.com/gryf/coach.git
synced 2026-03-04 07:45:53 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.7 KiB
1.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1117.0 | 1117.0 | 1117.0 | 1.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 197.0 | 0.0 | 1905.0 | 1905.0 | 788.0 | 1905.0 | 0.9992198800000168 | -21.0 | -21.0 | 0.0 | 0.0065035605150511894 | 0.004365216942868011 | 0.04185768589377403 | 1.6300582501571625e-05 | 6.250000000000001e-05 | 1.3552527156068802e-20 | 6.25e-05 | 6.25e-05 | 0.04899958 | 0.035690054 | 0.465425 | 0.0031771401 | |||||||
| 4 | 3 | 436.0 | 0.0 | 2862.0 | 2862.0 | 957.0 | 2862.0 | 0.9982724500000376 | -20.0 | -20.0 | 0.0 | 0.006882304690776307 | 0.0032755384482328074 | 0.018768906593322757 | 0.00028316525276750326 | 6.250000000000003e-05 | 2.7105054312137605e-20 | 6.25e-05 | 6.25e-05 | 0.037334877999999995 | 0.016123397 | 0.11000781 | 0.007852386 | -0.25035575 | 0.057181817 | -0.1695276 | -0.34914327 | |||
| 5 | 4 | 627.0 | 0.0 | 3623.0 | 3623.0 | 761.0 | 3623.0 | 0.997519060000054 | -21.0 | -21.0 | 0.0 | 0.004881470595769075 | 0.0024654802506201947 | 0.01351994462311268 | 3.340750481584109e-05 | 6.250000000000001e-05 | 1.3552527156068802e-20 | 6.25e-05 | 6.25e-05 | 0.028977735 | 0.016445445 | 0.09510474 | 0.0037849140000000003 | |||||||
| 6 | 5 | 855.0 | 0.0 | 4535.0 | 4535.0 | 912.0 | 4535.0 | 0.9966161800000736 | -20.0 | -20.0 | 0.0 | 0.004249975731765612 | 0.0017149519969122415 | 0.01000758446753025 | 5.5568867537658655e-05 | 6.250000000000003e-05 | 2.7105054312137605e-20 | 6.25e-05 | 6.25e-05 | 0.020409843 | 0.013720203 | 0.084716946 | 0.005521884 | -0.11609744 | 0.011784006000000001 | -0.10053374 | -0.13682899 |