mirror of
https://github.com/gryf/coach.git
synced 2026-03-01 22:25:46 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2.0 KiB
2.0 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 772.0 | 1.0 | 772.0 | 772.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 821.0 | 1.0 | 821.0 | 1593.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||
| 4 | 3 | 47.0 | 0.0 | 960.0 | 1.0 | 960.0 | 2553.0 | 0.0 | -20.0 | -20.0 | 0.0 | 1.1341655000000002 | 1.3580534 | 5.892931 | 0.0023950292 | 1.7183144000000001 | 0.11223783 | 1.7917048999999998 | 1.2778816000000002 | 0.04575298835242915 | 0.4587136251645712 | 1.7850174903869631 | -1.000868558883667 | -2.1548557 | 1.8245186999999998 | 0.0046853945000000004 | -5.0325212 | 0.2002191 | 0.18291572 | 0.6043339 | 3.6701476e-06 | 0.053236503 | 0.61648566 | 1.385742 | -1.489837 | |||||||||||
| 5 | 4 | 88.0 | 0.0 | 802.0 | 1.0 | 802.0 | 3355.0 | 0.0 | -21.0 | -21.0 | 0.0 | 1.3043066 | 0.5805045999999999 | 3.1598291 | 0.34928647 | 0.9577253000000001 | 0.112086765 | 1.7206139999999999 | 0.84311396 | 0.06997428238391876 | 0.3966030207984067 | 0.8193864822387695 | -0.957106113433838 | -3.1394837000000004 | 0.53956544 | -2.564023 | -4.4771279999999996 | 0.10647102 | 0.06113237 | 0.296035 | 0.04434104 | 0.07748371 | 0.32128 | 0.67603266 | -0.5327814999999999 | |||||||||||
| 6 | 5 | 129.0 | 0.0 | 815.0 | 1.0 | 815.0 | 4170.0 | 0.0 | -21.0 | -21.0 | 0.0 | 1.392543 | 0.8525935 | 3.4830544000000003 | 0.1963895 | 0.90076345 | 0.08584659 | 1.5318372 | 0.77205926 | -0.06867338344454765 | 0.4191816209286624 | 0.5097661018371582 | -0.9805699586868286 | -2.2598412 | 0.26356682 | -1.9182776999999998 | -2.800033 | 0.09539665 | 0.07594069999999999 | 0.25968197 | 0.031170906 | -0.07717073 | 0.33031166 | 0.287632 | -0.91986656 |