update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
2026-02-11 03:05:57 +01:00 · 2018-11-15 15:00:13 +02:00
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions
--- a/docs_raw/source/usage.rst
+++ b/docs_raw/source/usage.rst
@@ -0,0 +1,158 @@
+Usage
+=====
+
+One of the mechanism Coach uses for running experiments is the **Preset** mechanism.
+As its name implies, a preset defines a set of predefined experiment parameters.
+This allows defining a *complex* agent-environment interaction, with multiple parameters, and later running it through
+a very *simple* command line.
+
+The preset includes all the components that are used in the experiment, such as the agent internal components and
+the environment to use.
+It additionally defines general parameters for the experiment itself, such as the training schedule,
+visualization parameters, and testing parameters.
+
+Training an Agent
+-----------------
+
+Single-threaded Algorithms
++++++++++++++++++++++++++
+
+This is the most common case. Just choose a preset using the `-p` flag and press enter.
+To list the available presets, use the `-l` flag.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p CartPole_DQN
+
+Multi-threaded Algorithms
+++++++++++++++++++++++++
+
+Multi-threaded algorithms are very common this days.
+They typically achieve the best results, and scale gracefully with the number of threads.
+In Coach, running such algorithms is done by selecting a suitable preset, and choosing the number of threads to run using the :code:`-n` flag.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p CartPole_A3C -n 8
+
+Evaluating an Agent
+-------------------
+
+There are several options for evaluating an agent during the training:
+
+* For multi-threaded runs, an evaluation agent will constantly run in the background and evaluate the model during the training.
+
+* For single-threaded runs, it is possible to define an evaluation period through the preset. This will run several episodes of evaluation once in a while.
+
+Additionally, it is possible to save checkpoints of the agents networks and then run only in evaluation mode.
+Saving checkpoints can be done by specifying the number of seconds between storing checkpoints using the :code:`-s` flag.
+The checkpoints will be saved into the experiment directory.
+Loading a model for evaluation can be done by specifying the :code:`-crd` flag with the experiment directory, and the :code:`--evaluate` flag to disable training.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p CartPole_DQN -s 60
+   coach -p CartPole_DQN --evaluate -crd CHECKPOINT_RESTORE_DIR
+
+Playing with the Environment as a Human
+---------------------------------------
+
+Interacting with the environment as a human can be useful for understanding its difficulties and for collecting data for imitation learning.
+In Coach, this can be easily done by selecting a preset that defines the environment to use, and specifying the :code:`--play` flag.
+When the environment is loaded, the available keyboard buttons will be printed to the screen.
+Pressing the escape key when finished will end the simulation and store the replay buffer in the experiment dir.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -et rl_coach.environments.gym_environment:Atari -lvl BreakoutDeterministic-v4 --play
+
+Learning Through Imitation Learning
+-----------------------------------
+
+Learning through imitation of human behavior is a nice way to speedup the learning.
+In Coach, this can be done in two steps -
+
+1. Create a dataset of demonstrations by playing with the environment as a human.
+   After this step, a pickle of the replay buffer containing your game play will be stored in the experiment directory.
+   The path to this replay buffer will be printed to the screen.
+   To do so, you should select an environment type and level through the command line, and specify the :code:`--play` flag.
+
+    *Example:*
+
+.. code-block:: python
+
+   coach -et rl_coach.environments.doom_environment:DoomEnvironmentParameters -lvl Basic --play
+
+
+2. Next, use an imitation learning preset and set the replay buffer path accordingly.
+    The path can be set either from the command line or from the preset itself.
+
+    *Example:*
+
+.. code-block:: python
+
+    coach -p Doom_Basic_BC -cp='agent.load_memory_from_file_path=\"<experiment dir>/replay_buffer.p\"'
+
+
+Visualizations
+--------------
+
+Rendering the Environment
+++++++++++++++++++++++++
+
+Rendering the environment can be done by using the :code:`-r` flag.
+When working with multi-threaded algorithms, the rendered image will be representing the game play of the evaluation worker.
+When working with single-threaded algorithms, the rendered image will be representing the single worker which can be either training or evaluating.
+Keep in mind that rendering the environment in single-threaded algorithms may slow the training to some extent.
+When playing with the environment using the :code:`--play` flag, the environment will be rendered automatically without the need for specifying the :code:`-r` flag.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p Breakout_DQN -r
+
+Dumping GIFs
++++++++++++
+
+Coach allows storing GIFs of the agent game play.
+To dump GIF files, use the :code:`-dg` flag.
+The files are dumped after every evaluation episode, and are saved into the experiment directory, under a gifs sub-directory.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p Breakout_A3C -n 4 -dg
+
+Switching Between Deep Learning Frameworks
+------------------------------------------
+
+Coach uses TensorFlow as its main backend framework, but it also supports MXNet.
+MXNet is optional, and by default, TensorFlow will be used.
+If MXNet was installed, it is possible to switch to MXNet using the :code:`-f` flag.
+
+*Example:*
+
+.. code-block:: python
+
+   coach -p Doom_Basic_DQN -f mxnet
+
+Additional Flags
+----------------
+
+There are several convenient flags which are important to know about.
+The most up to date description can be found by using the :code:`-h` flag.
+
+.. argparse::
+   :module: rl_coach.coach
+   :func: create_argument_parser
+   :prog: coach