Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Discounted Return/Mean,Discounted Return/Stdev,Discounted Return/Max,Discounted Return/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,Q Values/Mean,Q Values/Stdev,Q Values/Max,Q Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min 1,0.0,1.0,1117.0,1.0,1117.0,1117.0,0.5,,,0.0,,,,,,,,,,,,,,,,-1.5180229894995567,0.6998808293377133,-0.08930329112720292,-3.148474706421977,,,,,,,,,,,,,,,, 2,163.0,0.0,821.0,1.0,821.0,1938.0,0.4919541999999965,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,-2.405652578063971,0.6237147471281423,-0.7105532272722921,-3.3691179328950627,,,,,,,,,0.25339470000000003,0.06996354,0.40677336,-0.35897204,0.035283737,0.10252844,1.0475135,1.1831225500000001e-05 3,320.0,0.0,782.0,1.0,782.0,2720.0,0.4842905999999932,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,-2.4614277069600043,0.5586658402302739,-0.7105532272722921,-3.354852824180864,,,,,,,,,0.20715186,0.062277785999999995,0.35004243,0.0036477323,0.05950941,0.13284620000000003,0.55984885,1.9053832e-05 4,522.0,0.0,1009.0,1.0,1009.0,3729.0,0.4744023999999889,-19.0,-19.0,0.0,,,,,,,,,,,,,,,,-1.74034851817599,0.8736518980911252,0.29537702481737355,-3.229858453919355,,,,,,,,,0.1964524,0.06919237,0.40447715,0.0016004617,0.08728501,0.21507107,0.96532106,3.7585607e-05 5,673.0,0.0,755.0,1.0,755.0,4484.0,0.4670033999999857,-21.0,-21.0,0.0,,,,,,,,,,,,,,,,-2.5246431129611286,0.5835765895797549,-0.7105532272722921,-3.3699982440767453,,,,,,,,,0.16121916,0.030521521,0.26771998,0.09214279,0.11407282,0.2374467,0.7852985,0.00861873