mirror of
https://github.com/gryf/coach.git
synced 2026-03-04 15:55:47 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.7 KiB
1.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1117.0 | 1117.0 | 1117.0 | 1117.0 | 1.0 | 0.0 | |||||||||||||||||||||
| 3 | 2 | 197.0 | 0.0 | 1905.0 | 1905.0 | 788.0 | 1905.0 | 0.9992908000000232 | -21.0 | -21.0 | 0.0 | 0.0051924175274927565 | 0.003918679938872439 | 0.04185768589377403 | 2.9565440854639746e-05 | 0.0002500000000000001 | 5.421010862427521e-20 | 0.00025 | 0.00025 | 0.01784605 | 0.03255357 | 0.465425 | 0.0038900522 | |||||||
| 4 | 3 | 436.0 | 0.0 | 2862.0 | 2862.0 | 957.0 | 2862.0 | 0.9984295000000516 | -20.0 | -20.0 | 0.0 | 0.004909432677758631 | 0.0024521858486776424 | 0.012306905351579191 | 0.00032079339143820107 | 0.0002500000000000001 | 1.0842021724855042e-19 | 0.00025 | 0.00025 | 0.0113589475 | 0.0037933819 | 0.025680352000000004 | 0.0035025426 | 0.030741736000000002 | 0.025549445 | 0.07848698 | -0.02225282 | |||
| 5 | 4 | 627.0 | 0.0 | 3623.0 | 3623.0 | 761.0 | 3623.0 | 0.9977446000000744 | -21.0 | -21.0 | 0.0 | 0.0052940571797080345 | 0.002501595309474277 | 0.012016894295811651 | 0.0003992373822256922 | 0.0002500000000000001 | 5.421010862427521e-20 | 0.00025 | 0.00025 | 0.010990373999999999 | 0.0038335419 | 0.027035048 | 0.005245461 | |||||||
| 6 | 5 | 855.0 | 0.0 | 4535.0 | 4535.0 | 912.0 | 4535.0 | 0.9969238000001012 | -20.0 | -20.0 | 0.0 | 0.004946799854224082 | 0.0024341152117377785 | 0.013126095756888391 | 0.0003701391979120672 | 0.0002500000000000001 | 1.0842021724855042e-19 | 0.00025 | 0.00025 | 0.010130615 | 0.0032620803 | 0.022317264 | 0.0045093056 | 0.026840469 | 0.01787639 | 0.051877695999999994 | -0.005629579 |