Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min 1,0.0,1.0,772.0,1.0,772.0,772.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,821.0,1.0,821.0,1593.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3,47.0,0.0,960.0,1.0,960.0,2553.0,0.0,-20.0,-20.0,0.0,,,,,,,,,,,,1.1341655000000002,1.3580534,5.892931,0.0023950292,1.7183144000000001,0.11223783,1.7917048999999998,1.2778816000000002,0.04575298835242915,0.4587136251645712,1.7850174903869631,-1.000868558883667,-2.1548557,1.8245186999999998,0.0046853945000000004,-5.0325212,0.2002191,0.18291572,0.6043339,3.6701476e-06,0.053236503,0.61648566,1.385742,-1.489837 4,88.0,0.0,802.0,1.0,802.0,3355.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.3043066,0.5805045999999999,3.1598291,0.34928647,0.9577253000000001,0.112086765,1.7206139999999999,0.84311396,0.06997428238391876,0.3966030207984067,0.8193864822387695,-0.957106113433838,-3.1394837000000004,0.53956544,-2.564023,-4.4771279999999996,0.10647102,0.06113237,0.296035,0.04434104,0.07748371,0.32128,0.67603266,-0.5327814999999999 5,129.0,0.0,815.0,1.0,815.0,4170.0,0.0,-21.0,-21.0,0.0,,,,,,,,,,,,1.392543,0.8525935,3.4830544000000003,0.1963895,0.90076345,0.08584659,1.5318372,0.77205926,-0.06867338344454765,0.4191816209286624,0.5097661018371582,-0.9805699586868286,-2.2598412,0.26356682,-1.9182776999999998,-2.800033,0.09539665,0.07594069999999999,0.25968197,0.031170906,-0.07717073,0.33031166,0.287632,-0.91986656