Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Discounted Return/Mean,Discounted Return/Stdev,Discounted Return/Max,Discounted Return/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min 1,0.0,1.0,1001.0,1.0,1001.0,1001.0,0.0,,,0.0,,,,,,,,,,,,,,,,0.1810549437584988,0.08342612458204374,0.3657155727590055,0.012114535848885052,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,2002.0,2.0,1001.0,2002.0,0.0,,,1.0,,,,,,,,,,,,,,,,0.10514369548395547,0.05043065738920054,0.21524430347618226,0.0011643643789458708,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3,1000.0,0.0,3003.0,3.0,1001.0,3003.0,-0.1185302492771778,7.715022587137692,77.15022587137695,1.0,,,,2.392776927604245e-05,0.0001238392879134552,0.003135734703391791,1.3632770787808113e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.0050078793,0.0064642574,0.098884284,0.0005620557,0.7529912365203498,0.4617358686541096,1.594227125555209,0.00353194497002051,,,,,,,,,,,,,,,,,,,,,0.050345387000000005,0.05729694,0.19809167,-0.08215243,0.05447568218516847,0.04079396823019403,0.1626849260711809,-0.023734710598228542,-0.34397406029780203,0.5815794471343353,1.0123427704556696,-1.4948724317067326 4,2001.0,0.0,4004.0,4.0,1001.0,4004.0,-0.2048510260598676,11.15149430448684,111.51494304486856,1.0,,,,4.744879189274797e-05,0.00012290942505022024,0.0018345331773161886,1.965395085790078e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.011079958999999999,0.012011923,0.13484268,0.0006308630000000001,1.0427006689416267,0.4698636052853145,2.4480673370988217,0.02353826061555796,,,,,,,,,,,,,,,,,,,,,0.08695571,0.039015066,0.20370862,-0.04777467,0.07752333968630509,0.04623246871067645,0.2261723560740124,-0.0225689002695848,-0.8518067738026681,0.6092545155137064,1.0646579642019018,-1.5449345365759264 5,3002.0,0.0,5005.0,5.0,1001.0,5005.0,-0.02134772535498328,13.93309848305012,139.33098483050114,0.0,,,,4.3079992066850536e-05,6.204749497676766e-05,0.0006331046461127697,2.68419535132125e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.012529638000000001,0.010803898999999999,0.07596945,0.0010158704,1.2820087653786647,0.3980341965211882,2.1725203141180294,6.57125185393007e-05,,,,,,,,,,,,,,,,,,,,,0.42543635,0.20454627,0.8917187,0.05104744,0.08902145835758844,0.04616376194044128,0.2426698236415496,-0.00472626581328319,-0.5524111559396501,0.6382799035600116,1.0798330140144352,-1.2145754834427926