mirror of
https://github.com/gryf/coach.git
synced 2026-03-12 20:45:55 +01:00
Docs changes - fixing blogpost links, removing importing all exploration policies (#139)
* updated docs * removing imports for all exploration policies in __init__ + setting the right blog-post link * small cleanups
This commit is contained in:
committed by
Scott Leishman
parent
155b78b995
commit
f12857a8c7
@@ -222,7 +222,7 @@
|
||||
<span class="k">return</span> <span class="s1">'rl_coach.exploration_policies.ucb:UCB'</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="UCB"><a class="viewcode-back" href="../../../components/exploration_policies/index.html#rl_coach.exploration_policies.UCB">[docs]</a><span class="k">class</span> <span class="nc">UCB</span><span class="p">(</span><span class="n">EGreedy</span><span class="p">):</span>
|
||||
<div class="viewcode-block" id="UCB"><a class="viewcode-back" href="../../../components/exploration_policies/index.html#rl_coach.exploration_policies.ucb.UCB">[docs]</a><span class="k">class</span> <span class="nc">UCB</span><span class="p">(</span><span class="n">EGreedy</span><span class="p">):</span>
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd"> UCB exploration policy is following the upper confidence bound heuristic to sample actions in discrete action spaces.</span>
|
||||
<span class="sd"> It assumes that there are multiple network heads that are predicting action values, and that the standard deviation</span>
|
||||
|
||||
Reference in New Issue
Block a user