mirror of
https://github.com/gryf/coach.git
synced 2025-12-29 18:02:29 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
1.7 KiB
1.7 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 97.0 | 1.0 | 25.0 | 25.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 194.0 | 2.0 | 25.0 | 50.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 0.0 | 0.0 | 291.0 | 3.0 | 25.0 | 75.0 | -0.013705192291281485 | -1000.0 | -1000.0 | 0.0 | -0.51026434 | 0.22476047 | -0.15544460000000002 | -0.9295912000000001 | 2.0812359514743166 | 3.3284790187301674 | 12.234674698678914 | -0.08146359109321984 | |||||||||||||||||||||||||||||||||||||||
| 5 | 4 | 0.0 | 0.0 | 388.0 | 4.0 | 25.0 | 100.0 | -0.02430443169727376 | -1000.0 | -1000.0 | 0.0 | -0.42551166 | 0.15804265 | -0.14439134 | -0.71600544 | 1.7233822661852551 | 2.691847085563749 | 10.017017240560527 | -0.08547367510074966 | |||||||||||||||||||||||||||||||||||||||
| 6 | 5 | 0.0 | 0.0 | 485.0 | 5.0 | 25.0 | 125.0 | 0.0 | -1000.0 | -1000.0 | 0.0 | -0.4319562 | 0.17422763 | -0.1460396 | -0.7337566999999999 | 1.742798057982355 | 2.725836758125469 | 10.305663257960603 | -0.09830476343631744 |