Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min 1,0.0,1.0,986.0,986.0,986.0,986.0,7.0,,,0.0,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,1806.0,1806.0,820.0,1806.0,4.0,,,0.0,,,,,,,,,,,,,,,,,,, 3,207.0,0.0,2634.0,2634.0,828.0,2634.0,1.0,-21.0,-21.0,0.0,,,,0.013430694482291505,0.012774117514024573,0.06467919796705246,0.0005054873763583599,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.013462509,0.005010004,0.032169305,0.0046610474,,,, 4,433.0,0.0,3538.0,3538.0,904.0,3538.0,1.0,-21.0,-21.0,0.0,,,,0.013214294455993912,0.012243776759493771,0.048550304025411606,0.00030727600096724933,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.012283348500000001,0.004644497,0.032848116000000004,0.0047284905,,,, 5,664.0,0.0,4462.0,4462.0,924.0,4462.0,2.0,-20.0,-20.0,0.0,,,,0.013385360111538885,0.013904787720461907,0.06079941987991332,0.0005098563269712031,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.010943641,0.0043348954,0.03260831,0.0045090048,0.00066530565,0.0129122045,0.024260167000000003,-0.034502137