Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Discounted Return/Mean,Discounted Return/Stdev,Discounted Return/Max,Discounted Return/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min 1,0.0,1.0,1000.0,1.0,1000.0,1000.0,0.0,,,0.0,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,2000.0,2.0,1000.0,2000.0,0.0,,,1.0,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3,999.0,0.0,3000.0,3.0,1000.0,3000.0,-0.017666830179174003,0.0,0.0,1.0,,,,0.0035039535888519192,0.0036899070860429047,0.0448087714612484,0.0006857202388346195,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.15666896,0.14784017,1.7027674,0.027932363999999998,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,0.28616783,0.1527039,0.82195866,-0.052169282000000004,0.12254468997797058,0.14238915773279914,0.9096015518903732,-0.459991791844368,0.07060378249719693,0.9005342625617344,1.6849354556578502,-1.5135477528043664 4,1999.0,0.0,4000.0,4.0,1000.0,4000.0,-0.039999362478752916,0.0,0.0,1.0,,,,0.0023975625592941143,0.0020512967193832047,0.03644856810569763,0.0006296735955402255,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.10159315,0.08087424,1.1576957,0.023222,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,0.24290629,0.11069059,0.7463965,-0.0065027475,0.2942720267621687,0.14379040300676074,1.032085758447647,-0.2381104633212089,0.4210885653083537,0.7995310847152353,1.7607054373862634,-1.3770179740736286 5,2999.0,0.0,5000.0,5.0,1000.0,5000.0,0.17145601483403705,0.0,0.0,0.0,,,,0.0013986770755837895,0.0011338123535962153,0.018688598647713658,0.00039429025491699576,0.00010000000000000003,4.0657581468206416e-20,0.0001,0.0001,0.057997093,0.035725467000000004,0.5853672,0.015558452,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,0.19842595,0.07881614,0.6942527,-0.16243972,0.367736402077882,0.1450020289455392,1.0550748002529144,-0.04278800502419472,-0.009665988609877104,0.8992571498573005,1.8188017021946057,-1.4079581912324373