update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-03-17 15:23:37 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/docs_raw/source/components/agents/policy_optimization/ac.rst
+++ b/docs_raw/source/components/agents/policy_optimization/ac.rst
@@ -0,0 +1,40 @@
+Actor-Critic
+============
+
+**Actions space:** Discrete | Continuous
+
+**References:** `Asynchronous Methods for Deep Reinforcement Learning <https://arxiv.org/abs/1602.01783>`_
+
+Network Structure
+-----------------
+
+.. image:: /_static/img/design_imgs/ac.png
+   :width: 500px
+   :align: center
+
+Algorithm Description
+---------------------
+
+Choosing an action - Discrete actions
+++++++++++++++++++++++++++++++++++++
+
+The policy network is used in order to predict action probabilites. While training, a sample is taken from a categorical
+distribution assigned with these probabilities. When testing, the action with the highest probability is used.
+
+Training the network
++++++++++++++++++++
+A batch of :math:`T_{max}` transitions is used, and the advantages are calculated upon it.
+
+Advantages can be calculated by either of the following methods (configured by the selected preset) -
+
+1. **A_VALUE** - Estimating advantage directly:
+   :math:`A(s_t, a_t) = \underbrace{\sum_{i=t}^{i=t + k - 1} \gamma^{i-t}r_i +\gamma^{k} V(s_{t+k})}_{Q(s_t, a_t)} - V(s_t)`
+   where :math:`k` is :math:`T_{max} - State\_Index` for each state in the batch.
+
+2. **GAE** - By following the `Generalized Advantage Estimation <https://arxiv.org/abs/1506.02438>`_ paper.
+
+The advantages are then used in order to accumulate gradients according to 
+:math:`L = -\mathop{\mathbb{E}} [log (\pi) \cdot A]`
+
+
+.. autoclass:: rl_coach.agents.actor_critic_agent.ActorCriticAlgorithmParameters