RL in Large Discrete Action Spaces - Wolpertinger Agent (#394)

* Currently this is specific to the case of discretizing a continuous action space. Can easily be adapted to other case by feeding the kNN otherwise, and removing the usage of a discretizing output action filter
2026-02-13 20:35:48 +01:00 · 2019-09-08 12:53:49 +03:00
parent fc50398544
commit 138ced23ba
46 changed files with 1193 additions and 51 deletions
--- a/docs_raw/source/components/agents/index.rst
+++ b/docs_raw/source/components/agents/index.rst
@@ -21,8 +21,6 @@ A detailed description of those algorithms can be found by navigating to each of
   imitation/cil
   policy_optimization/cppo
   policy_optimization/ddpg
-   policy_optimization/td3
-   policy_optimization/sac
   other/dfp
   value_optimization/double_dqn
   value_optimization/dqn
@@ -36,6 +34,10 @@ A detailed description of those algorithms can be found by navigating to each of
   policy_optimization/ppo
   value_optimization/rainbow
   value_optimization/qr_dqn
+   policy_optimization/sac
+   policy_optimization/td3
+   policy_optimization/wolpertinger
+


 .. autoclass:: rl_coach.base_parameters.AgentParameters