Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Discounted Return/Mean,Discounted Return/Stdev,Discounted Return/Max,Discounted Return/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min 1,0.0,1.0,986.0,986.0,986.0,986.0,7.0,,,0.0,,,,,,,,,,,,,,,,-1.8205545076821419,0.7192845707051421,-0.2081522550905921,-3.1698994392478896,,,, 2,0.0,1.0,1806.0,1806.0,820.0,1806.0,4.0,,,0.0,,,,,,,,,,,,,,,,-2.3370969394351864,0.575288014748253,-0.7105532272722921,-3.355172823288848,,,, 3,206.0,0.0,2629.0,2629.0,823.0,2629.0,5.0,-21.0,-21.0,0.0,,,,0.014186631905104856,0.013655308200271828,0.06909694522619247,0.0005460917018353938,0.0002500000000000001,1.0842021724855042e-19,0.00025,0.00025,0.014938818000000001,0.0055247187,0.034780357000000005,0.0049935523,-2.3342722836314502,0.7834970909114538,-0.38878391807422696,-3.369599601005491,,,, 4,398.0,0.0,3397.0,3397.0,768.0,3397.0,3.0,-21.0,-21.0,0.0,,,,0.014518419023564396,0.013256214475088386,0.06440683454275131,0.0005935237277299166,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.013618752,0.003883305,0.028320136,0.0057370984,-2.4495140411664926,0.5558315778011723,-0.7105532272722921,-3.354852824180864,,,, 5,705.0,0.0,4626.0,4626.0,1229.0,4626.0,6.0,-19.0,-19.0,0.0,,,,0.013912314557241342,0.013573258327554268,0.08049257844686508,0.00038326982758007933,0.0002500000000000001,5.421010862427521e-20,0.00025,0.00025,0.0129638435,0.004854921,0.035924666,0.0042663114,-1.4469428047536403,0.7634920719307412,-0.008604775224526406,-3.170625540860168,-0.013995509,0.012983983999999999,0.019298933,-0.037532326