mirror of
https://github.com/gryf/coach.git
synced 2026-01-28 11:05:46 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2.6 KiB
2.6 KiB
| 1 | Episode # | Training Iter | In Heatup | ER #Transitions | ER #Episodes | Episode Length | Total steps | Epsilon | Shaped Training Reward | Training Reward | Update Target Network | Evaluation Reward | Shaped Evaluation Reward | Success Rate | Loss/Mean | Loss/Stdev | Loss/Max | Loss/Min | Learning Rate/Mean | Learning Rate/Stdev | Learning Rate/Max | Learning Rate/Min | Grads (unclipped)/Mean | Grads (unclipped)/Stdev | Grads (unclipped)/Max | Grads (unclipped)/Min | Entropy/Mean | Entropy/Stdev | Entropy/Max | Entropy/Min | Advantages/Mean | Advantages/Stdev | Advantages/Max | Advantages/Min | Values/Mean | Values/Stdev | Values/Max | Values/Min | Value Loss/Mean | Value Loss/Stdev | Value Loss/Max | Value Loss/Min | Policy Loss/Mean | Policy Loss/Stdev | Policy Loss/Max | Policy Loss/Min | Q/Mean | Q/Stdev | Q/Max | Q/Min | TD targets/Mean | TD targets/Stdev | TD targets/Max | TD targets/Min | actions/Mean | actions/Stdev | actions/Max | actions/Min |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0.0 | 1.0 | 1001.0 | 1.0 | 1001.0 | 1001.0 | 0.0 | 0.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 2 | 0.0 | 1.0 | 2002.0 | 2.0 | 1001.0 | 2002.0 | 0.0 | 1.0 | |||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 3 | 1000.0 | 0.0 | 3003.0 | 3.0 | 1001.0 | 3003.0 | -0.1185302492771778 | 12.73546386992824 | 127.3546386992823 | 1.0 | 1.4965620654038504e-05 | 3.650858260133972e-05 | 0.0007415295694954692 | 1.510996071374393e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.0049991608 | 0.004320714 | 0.029555712 | 0.00052324863 | -0.02509604 | 0.12253879 | 0.19679643 | -0.25691667 | 0.002979598431912616 | 0.042334642053058036 | 0.09477341320020807 | -0.1277348208010601 | 0.7574106673312205 | 0.2820158065549947 | 1.3628734977284602 | -0.13561528749852786 | |||||||||||||||||||||||
| 5 | 4 | 2001.0 | 0.0 | 4004.0 | 4.0 | 1001.0 | 4004.0 | -0.2048510260598676 | 7.629510433822026 | 76.29510433822016 | 1.0 | 9.294460378555413e-05 | 0.00018001446184314637 | 0.0014042556285858154 | 1.88643639376096e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.018415965 | 0.019779565 | 0.19278607 | 0.0008359549000000001 | -0.007871467 | 0.112213835 | 0.17972693 | -0.23277566 | 0.002690244749371092 | 0.08247475995739656 | 0.20102381350942625 | -0.2633941138081878 | 0.8866818175665385 | 0.1980599181751808 | 1.3750565774684147 | 0.4541525586846937 | |||||||||||||||||||||||
| 6 | 5 | 3002.0 | 0.0 | 5005.0 | 5.0 | 1001.0 | 5005.0 | -0.02134772535498328 | 7.612595851248884 | 76.12595851248874 | 0.0 | 4.2167748756014586e-05 | 0.00010527586086637082 | 0.001020723837427795 | 1.4967686183808837e-06 | 0.00010000000000000003 | 2.7105054312137605e-20 | 0.0001 | 0.0001 | 0.009330036 | 0.0073557219999999994 | 0.04758695 | 0.00048721785000000004 | -0.00076002936 | 0.12163509 | 0.17490079 | -0.2400235 | 0.009237181711633144 | 0.09619469158143916 | 0.21206437128683006 | -0.2783133662129137 | 1.0669116245649553 | 0.12577960072670955 | 1.400521073123623 | 0.7853534082159903 |