Episode #,Training Iter,In Heatup,ER #Transitions,ER #Episodes,Episode Length,Total steps,Epsilon,Shaped Training Reward,Training Reward,Update Target Network,Evaluation Reward,Shaped Evaluation Reward,Success Rate,Loss/Mean,Loss/Stdev,Loss/Max,Loss/Min,Learning Rate/Mean,Learning Rate/Stdev,Learning Rate/Max,Learning Rate/Min,Grads (unclipped)/Mean,Grads (unclipped)/Stdev,Grads (unclipped)/Max,Grads (unclipped)/Min,Entropy/Mean,Entropy/Stdev,Entropy/Max,Entropy/Min,Advantages/Mean,Advantages/Stdev,Advantages/Max,Advantages/Min,Values/Mean,Values/Stdev,Values/Max,Values/Min,Value Loss/Mean,Value Loss/Stdev,Value Loss/Max,Value Loss/Min,Policy Loss/Mean,Policy Loss/Stdev,Policy Loss/Max,Policy Loss/Min,Q/Mean,Q/Stdev,Q/Max,Q/Min,TD targets/Mean,TD targets/Stdev,TD targets/Max,TD targets/Min,actions/Mean,actions/Stdev,actions/Max,actions/Min 1,0.0,1.0,1001.0,1.0,1001.0,1001.0,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2,0.0,1.0,2002.0,2.0,1001.0,2002.0,0.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3,1000.0,0.0,3003.0,3.0,1001.0,3003.0,-0.1185302492771778,12.73546386992824,127.3546386992823,1.0,,,,1.4965620654038504e-05,3.650858260133972e-05,0.0007415295694954692,1.510996071374393e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.0049991608,0.004320714,0.029555712,0.00052324863,,,,,,,,,,,,,,,,,,,,,-0.02509604,0.12253879,0.19679643,-0.25691667,0.002979598431912616,0.042334642053058036,0.09477341320020807,-0.1277348208010601,0.7574106673312205,0.2820158065549947,1.3628734977284602,-0.13561528749852786 4,2001.0,0.0,4004.0,4.0,1001.0,4004.0,-0.2048510260598676,7.629510433822026,76.29510433822016,1.0,,,,9.294460378555413e-05,0.00018001446184314637,0.0014042556285858154,1.88643639376096e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.018415965,0.019779565,0.19278607,0.0008359549000000001,,,,,,,,,,,,,,,,,,,,,-0.007871467,0.112213835,0.17972693,-0.23277566,0.002690244749371092,0.08247475995739656,0.20102381350942625,-0.2633941138081878,0.8866818175665385,0.1980599181751808,1.3750565774684147,0.4541525586846937 5,3002.0,0.0,5005.0,5.0,1001.0,5005.0,-0.02134772535498328,7.612595851248884,76.12595851248874,0.0,,,,4.2167748756014586e-05,0.00010527586086637082,0.001020723837427795,1.4967686183808837e-06,0.00010000000000000003,2.7105054312137605e-20,0.0001,0.0001,0.009330036,0.0073557219999999994,0.04758695,0.00048721785000000004,,,,,,,,,,,,,,,,,,,,,-0.00076002936,0.12163509,0.17490079,-0.2400235,0.009237181711633144,0.09619469158143916,0.21206437128683006,-0.2783133662129137,1.0669116245649553,0.12577960072670955,1.400521073123623,0.7853534082159903