update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-03-14 13:45:46 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/rl_coach/agents/actor_critic_agent.py
+++ b/rl_coach/agents/actor_critic_agent.py
@@ -36,25 +36,25 @@ from rl_coach.utils import last_sample
 class ActorCriticAlgorithmParameters(AlgorithmParameters):
    """
    :param policy_gradient_rescaler: (PolicyGradientRescaler)
-    The value that will be used to rescale the policy gradient
+        The value that will be used to rescale the policy gradient

    :param apply_gradients_every_x_episodes: (int)
-    The number of episodes to wait before applying the accumulated gradients to the network.
-    The training iterations only accumulate gradients without actually applying them.
+        The number of episodes to wait before applying the accumulated gradients to the network.
+        The training iterations only accumulate gradients without actually applying them.

    :param beta_entropy: (float)
-    The weight that will be given to the entropy regularization which is used in order to improve exploration.
+        The weight that will be given to the entropy regularization which is used in order to improve exploration.

    :param num_steps_between_gradient_updates: (int)
-    Every num_steps_between_gradient_updates transitions will be considered as a single batch and use for
-    accumulating gradients. This is also the number of steps used for bootstrapping according to the n-step formulation.
+        Every num_steps_between_gradient_updates transitions will be considered as a single batch and use for
+        accumulating gradients. This is also the number of steps used for bootstrapping according to the n-step formulation.

    :param gae_lambda: (float)
-    If the policy gradient rescaler was defined as PolicyGradientRescaler.GAE, the generalized advantage estimation
-    scheme will be used, in which case the lambda value controls the decay for the different n-step lengths.
+        If the policy gradient rescaler was defined as PolicyGradientRescaler.GAE, the generalized advantage estimation
+        scheme will be used, in which case the lambda value controls the decay for the different n-step lengths.

    :param estimate_state_value_using_gae: (bool)
-    If set to True, the state value targets for the V head will be estimated using the GAE scheme.
+        If set to True, the state value targets for the V head will be estimated using the GAE scheme.
    """
    def __init__(self):
        super().__init__()