mirror of
https://github.com/gryf/coach.git
synced 2026-02-14 21:15:53 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2.5 KiB
2.5 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1000.0 | 1.0 | 1000.0 | 1000.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 2000.0 | 2.0 | 1000.0 | 2000.0 | 0.0 | 1.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 999.0 | 0.0 | 3000.0 | 3.0 | 1000.0 | 3000.0 | -0.017666830179174003 | 0.0 | 0.0 | 1.0 | 0.005126546850151572 | 0.004660130005352106 | 0.05132860690355301 | 0.0005627279169857502 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.1678458 | 0.12852536 | 1.5926425 | 0.024081124 | 0.24388833 | 0.11236252 | 0.42840713 | -0.8727883 | -0.01800971130044855 | 0.1942566799457346 | 0.5385999780893326 | -0.9172834092378616 | -0.11966089038501045 | 0.8962365587209448 | 1.3716363433793126 | -1.5680451743766328 | |||||||||||||||||||||||
| 5 | 4 | 1999.0 | 0.0 | 4000.0 | 4.0 | 1000.0 | 4000.0 | -0.039999362478752916 | 0.0 | 0.0 | 1.0 | 0.0008180646479820358 | 0.000529273102626917 | 0.0054473504424095145 | 0.00014673141413368285 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.0469651 | 0.025094092000000002 | 0.22221590000000002 | 0.010784525 | 0.14337498 | 0.17592207 | 0.33719423 | -0.28446856 | 0.20208258858056294 | 0.13431578391837634 | 0.5768654608726501 | -0.5833876812458039 | -0.2705900161928217 | 0.9272508528236816 | 1.9572209345620784 | -1.4727463554915825 | |||||||||||||||||||||||
| 6 | 5 | 2999.0 | 0.0 | 5000.0 | 5.0 | 1000.0 | 5000.0 | 0.17145601483403705 | 0.0 | 0.0 | 0.0 | 0.0003958249435308753 | 0.00031769597300822634 | 0.0040870513767004004 | 0.00010442566417623311 | 0.00010000000000000003 | 4.0657581468206416e-20 | 0.0001 | 0.0001 | 0.025218817999999997 | 0.013975793500000002 | 0.14064097 | 0.0070197446999999994 | -0.04435015 | 0.030164617999999997 | 0.124313995 | -0.2207815 | 0.26081367032274555 | 0.12723809202247516 | 0.5931554335355759 | -0.2620022776722908 | -0.2959362262223715 | 0.6939703144112135 | 1.0669463809202309 | -1.416814717430604 |