mirror of
https://github.com/gryf/coach.git
synced 2026-01-03 20:34:18 +01:00
* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
7 lines
1.7 KiB
CSV
7 lines
1.7 KiB
CSV
Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min
|
|
1,0.0,1.0,97.0,1.0,25.0,25.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
|
|
2,0.0,1.0,194.0,2.0,25.0,50.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
|
|
3,0.0,0.0,291.0,3.0,25.0,75.0,-0.013705192291281485,-1000.0,-1000.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-0.51026434,0.22476047,-0.15544460000000002,-0.9295912000000001,,,,,2.0812359514743166,3.3284790187301674,12.234674698678914,-0.08146359109321984
|
|
4,0.0,0.0,388.0,4.0,25.0,100.0,-0.02430443169727376,-1000.0,-1000.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-0.42551166,0.15804265,-0.14439134,-0.71600544,,,,,1.7233822661852551,2.691847085563749,10.017017240560527,-0.08547367510074966
|
|
5,0.0,0.0,485.0,5.0,25.0,125.0,0.0,-1000.0,-1000.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-0.4319562,0.17422763,-0.1460396,-0.7337566999999999,,,,,1.742798057982355,2.725836758125469,10.305663257960603,-0.09830476343631744
|