mirror of
https://github.com/gryf/coach.git
synced 2025-12-18 03:30:19 +01:00
update of api docstrings across coach and tutorials [WIP] (#91)
* updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation
This commit is contained in:
158
docs_raw/source/usage.rst
Normal file
158
docs_raw/source/usage.rst
Normal file
@@ -0,0 +1,158 @@
|
||||
Usage
|
||||
=====
|
||||
|
||||
One of the mechanism Coach uses for running experiments is the **Preset** mechanism.
|
||||
As its name implies, a preset defines a set of predefined experiment parameters.
|
||||
This allows defining a *complex* agent-environment interaction, with multiple parameters, and later running it through
|
||||
a very *simple* command line.
|
||||
|
||||
The preset includes all the components that are used in the experiment, such as the agent internal components and
|
||||
the environment to use.
|
||||
It additionally defines general parameters for the experiment itself, such as the training schedule,
|
||||
visualization parameters, and testing parameters.
|
||||
|
||||
Training an Agent
|
||||
-----------------
|
||||
|
||||
Single-threaded Algorithms
|
||||
++++++++++++++++++++++++++
|
||||
|
||||
This is the most common case. Just choose a preset using the `-p` flag and press enter.
|
||||
To list the available presets, use the `-l` flag.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p CartPole_DQN
|
||||
|
||||
Multi-threaded Algorithms
|
||||
+++++++++++++++++++++++++
|
||||
|
||||
Multi-threaded algorithms are very common this days.
|
||||
They typically achieve the best results, and scale gracefully with the number of threads.
|
||||
In Coach, running such algorithms is done by selecting a suitable preset, and choosing the number of threads to run using the :code:`-n` flag.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p CartPole_A3C -n 8
|
||||
|
||||
Evaluating an Agent
|
||||
-------------------
|
||||
|
||||
There are several options for evaluating an agent during the training:
|
||||
|
||||
* For multi-threaded runs, an evaluation agent will constantly run in the background and evaluate the model during the training.
|
||||
|
||||
* For single-threaded runs, it is possible to define an evaluation period through the preset. This will run several episodes of evaluation once in a while.
|
||||
|
||||
Additionally, it is possible to save checkpoints of the agents networks and then run only in evaluation mode.
|
||||
Saving checkpoints can be done by specifying the number of seconds between storing checkpoints using the :code:`-s` flag.
|
||||
The checkpoints will be saved into the experiment directory.
|
||||
Loading a model for evaluation can be done by specifying the :code:`-crd` flag with the experiment directory, and the :code:`--evaluate` flag to disable training.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p CartPole_DQN -s 60
|
||||
coach -p CartPole_DQN --evaluate -crd CHECKPOINT_RESTORE_DIR
|
||||
|
||||
Playing with the Environment as a Human
|
||||
---------------------------------------
|
||||
|
||||
Interacting with the environment as a human can be useful for understanding its difficulties and for collecting data for imitation learning.
|
||||
In Coach, this can be easily done by selecting a preset that defines the environment to use, and specifying the :code:`--play` flag.
|
||||
When the environment is loaded, the available keyboard buttons will be printed to the screen.
|
||||
Pressing the escape key when finished will end the simulation and store the replay buffer in the experiment dir.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -et rl_coach.environments.gym_environment:Atari -lvl BreakoutDeterministic-v4 --play
|
||||
|
||||
Learning Through Imitation Learning
|
||||
-----------------------------------
|
||||
|
||||
Learning through imitation of human behavior is a nice way to speedup the learning.
|
||||
In Coach, this can be done in two steps -
|
||||
|
||||
1. Create a dataset of demonstrations by playing with the environment as a human.
|
||||
After this step, a pickle of the replay buffer containing your game play will be stored in the experiment directory.
|
||||
The path to this replay buffer will be printed to the screen.
|
||||
To do so, you should select an environment type and level through the command line, and specify the :code:`--play` flag.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -et rl_coach.environments.doom_environment:DoomEnvironmentParameters -lvl Basic --play
|
||||
|
||||
|
||||
2. Next, use an imitation learning preset and set the replay buffer path accordingly.
|
||||
The path can be set either from the command line or from the preset itself.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p Doom_Basic_BC -cp='agent.load_memory_from_file_path=\"<experiment dir>/replay_buffer.p\"'
|
||||
|
||||
|
||||
Visualizations
|
||||
--------------
|
||||
|
||||
Rendering the Environment
|
||||
+++++++++++++++++++++++++
|
||||
|
||||
Rendering the environment can be done by using the :code:`-r` flag.
|
||||
When working with multi-threaded algorithms, the rendered image will be representing the game play of the evaluation worker.
|
||||
When working with single-threaded algorithms, the rendered image will be representing the single worker which can be either training or evaluating.
|
||||
Keep in mind that rendering the environment in single-threaded algorithms may slow the training to some extent.
|
||||
When playing with the environment using the :code:`--play` flag, the environment will be rendered automatically without the need for specifying the :code:`-r` flag.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p Breakout_DQN -r
|
||||
|
||||
Dumping GIFs
|
||||
++++++++++++
|
||||
|
||||
Coach allows storing GIFs of the agent game play.
|
||||
To dump GIF files, use the :code:`-dg` flag.
|
||||
The files are dumped after every evaluation episode, and are saved into the experiment directory, under a gifs sub-directory.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p Breakout_A3C -n 4 -dg
|
||||
|
||||
Switching Between Deep Learning Frameworks
|
||||
------------------------------------------
|
||||
|
||||
Coach uses TensorFlow as its main backend framework, but it also supports MXNet.
|
||||
MXNet is optional, and by default, TensorFlow will be used.
|
||||
If MXNet was installed, it is possible to switch to MXNet using the :code:`-f` flag.
|
||||
|
||||
*Example:*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
coach -p Doom_Basic_DQN -f mxnet
|
||||
|
||||
Additional Flags
|
||||
----------------
|
||||
|
||||
There are several convenient flags which are important to know about.
|
||||
The most up to date description can be found by using the :code:`-h` flag.
|
||||
|
||||
.. argparse::
|
||||
:module: rl_coach.coach
|
||||
:func: create_argument_parser
|
||||
:prog: coach
|
||||
Reference in New Issue
Block a user