update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-03-17 23:33:37 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/rl_coach/agents/n_step_q_agent.py
+++ b/rl_coach/agents/n_step_q_agent.py
@@ -44,6 +44,26 @@ class NStepQNetworkParameters(NetworkParameters):


 class NStepQAlgorithmParameters(AlgorithmParameters):
+    """
+    :param num_steps_between_copying_online_weights_to_target: (StepMethod)
+        The number of steps between copying the online network weights to the target network weights.
+
+    :param apply_gradients_every_x_episodes: (int)
+        The number of episodes between applying the accumulated gradients to the network. After every
+        num_steps_between_gradient_updates steps, the agent will calculate the gradients for the collected data,
+        it will then accumulate it in internal accumulators, and will only apply them to the network once in every
+        apply_gradients_every_x_episodes episodes.
+
+    :param num_steps_between_gradient_updates: (int)
+        The number of steps between calculating gradients for the collected data. In the A3C paper, this parameter is
+        called t_max. Since this algorithm is on-policy, only the steps collected between each two gradient calculations
+        are used in the batch.
+
+    :param targets_horizon: (str)
+        Should be either 'N-Step' or '1-Step', and defines the length for which to bootstrap the network values over.
+        Essentially, 1-Step follows the regular 1 step bootstrapping Q learning update. For more information,
+        please refer to the original paper (https://arxiv.org/abs/1602.01783)
+    """
    def __init__(self):
        super().__init__()
        self.num_steps_between_copying_online_weights_to_target = EnvironmentSteps(10000)