Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min 1,0.0,1.0,1001.0,1.0,1001.0,1001.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,2002.0,2.0,1001.0,2002.0,0.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3,1000.0,0.0,3003.0,3.0,1001.0,3003.0,-0.1185302492771778,8.62704551591294,86.2704551591296,1.0,,,,1.0509011072599606e-05,4.393642656353033e-05,0.0008535402594134213,1.1514939615153708e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.004000389,0.00447183,0.062234186,0.00047969296999999996,,,,,,,,,,,,,,,,,,,,,0.08464705,0.16014087,0.45386302,-0.26037258,0.01247160570665026,0.02153857694844653,0.08672064238048882,-0.04962609781241383,0.3359349988514577,0.6368093944604776,1.3638484370927098,-1.3839266445045957 4,2001.0,0.0,4004.0,4.0,1001.0,4004.0,-0.2048510260598676,17.580070175231974,175.80070175231998,1.0,,,,0.0005509343815205071,0.0018491137578482792,0.023759014904499054,5.607626462733606e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.045537997000000004,0.09140324,1.2210321000000002,0.0010273910000000001,,,,,,,,,,,,,,,,,,,,,0.1922657,0.16243528,0.44480476,-0.2532415,0.03993582413609073,0.11728732960908478,0.5736919507147175,-0.26410636501093465,0.6924021347523865,0.5892731229023225,1.3749280698542792,-1.507436630113174 5,3002.0,0.0,5005.0,5.0,1001.0,5005.0,-0.02134772535498328,13.124325999088368,131.24325999088364,0.0,,,,0.0001703916229396802,0.000568676102611858,0.004801726434379816,2.6488642106414773e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.014244637,0.014174069,0.10748595,0.00069606147,,,,,,,,,,,,,,,,,,,,,0.38734838,0.23498419,0.6344281,-0.10678842,0.09845966999879296,0.17017726714756395,0.6471482681083021,-0.23208531499469515,0.8583268163158988,0.5493396564055796,1.4005169604031336,-1.084873489999208