Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Discounted Return/Mean,Discounted Return/Stdev,Discounted Return/Max,Discounted Return/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min 1,0.0,1.0,772.0,1.0,772.0,772.0,0.0,,,0.0,,,,,,,,,,,,,,,,-2.437332009209832,0.5666975756966289,-0.7105532272722921,-3.364332223379411,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,821.0,1.0,821.0,1593.0,0.0,,,0.0,,,,,,,,,,,,,,,,-2.3375427452853184,0.562882024173797,-0.7105532272722921,-3.3225778431943085,,,,,,,,,,,,,,,,,,,, 3,38.0,0.0,763.0,1.0,763.0,2356.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.2098866000000001,1.449215,5.3609241999999995,0.00244356,-2.5178046202451694,0.5843148195084643,-0.7105532272722921,-3.3699982440767453,1.7662544999999998,0.03266678,1.7917435000000002,1.6590552,-0.09202062882188904,0.4331878633448028,0.8984384536743164,-0.9984065890312196,-1.7017021,1.594185,-0.041688699999999995,-4.9379349999999995,0.22111915,0.19444092,0.59284925,2.0590694e-05,-0.1670907,0.6072369000000001,0.91119975,-1.4746135 4,75.0,0.0,740.0,1.0,740.0,3096.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.9744401999999999,0.8914412,4.1053777,0.54077625,-2.533184641659896,0.5861942513660167,-0.7105532272722921,-3.3699982440767453,1.1498803,0.17261624,1.7622604,0.99270844,0.19542027049594451,0.4488243660076464,0.9995923042297364,-0.9500741958618164,-5.264807,0.15003455,-4.995072,-5.481164,0.22140607,0.09454554,0.4738923,0.1269643,0.24964908,0.42948514,0.77320266,-0.62486225 5,113.0,0.0,755.0,1.0,755.0,3851.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.6745409,0.7766086999999999,3.4373443,0.42763662,-2.5246431129611286,0.5835765895797549,-0.7105532272722921,-3.3699982440767453,0.8715389000000001,0.11778103,1.3657371999999999,0.70607877,0.02502031455168853,0.4342484515718581,0.8662590980529785,-0.9686682224273682,-3.4500135999999997,0.4302874,-3.1447627999999996,-4.761848400000001,0.10419229,0.05407177,0.22493912,0.052366237999999996,0.030020599999999995,0.3442417,0.68716383,-0.674335