Updated tutorial and docs (#386)

Improved getting started tutorial, and updated docs to point to version 1.0.0
2026-03-19 08:23:33 +01:00 · 2019-08-05 16:46:15 +03:00
parent c1d1fae342
commit 92460736bc
12 changed files with 135 additions and 127 deletions
--- a/tutorials/0.
+++ b/tutorials/0.
@@ -7,6 +7,21 @@
    "# Getting Started Guide"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of Contents\n",
+    "- [Using Coach from the Command Line](#Using-Coach-from-the-Command-Line)\n",
+    "- [Using Coach as a Library](#Using-Coach-as-a-Library)\n",
+    "    - [Preset based - using `CoachInterface`](#Preset-based---using-CoachInterface)\n",
+    "        - [Training a preset](#Training-a-preset)\n",
+    "        - [Running each training or inference iteration manually](#Running-each-training-or-inference-iteration-manually)\n",
+    "    - [Non-preset - using `GraphManager` directly](#Non-preset---using-GraphManager-directly)\n",
+    "        - [Training an agent with a custom Gym environment](#Training-an-agent-with-a-custom-Gym-environment)\n",
+    "        - [Advanced functionality - proprietary exploration policy, checkpoint evaluation](#Advanced-functionality---proprietary-exploration-policy,-checkpoint-evaluation)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -54,11 +69,7 @@
   "source": [
    "Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
    "\n",
-    "Running Coach directly from python is done through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
-    "\n",
-    "Let's start with some examples.\n",
-    "\n",
-    "Creating a very simple graph containing a single Clipped PPO agent running with the CartPole-v0 Gym environment:"
+    "**Note: Each one of the examples in this section is independent, so notebook kernels need to be restarted before running it. Make sure you run the next cell before running any of the examples.**"
   ]
  },
  {
@@ -75,7 +86,28 @@
    "if module_path not in sys.path:\n",
    "    sys.path.append(module_path)\n",
    "if resources_path not in sys.path:\n",
-    "    sys.path.append(resources_path)"
+    "    sys.path.append(resources_path)\n",
+    " \n",
+    "from rl_coach.coach import CoachInterface"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Preset based - using `CoachInterface`\n",
+    "\n",
+    "The basic method to run Coach directly from python is  through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
+    "\n",
+    "Let's start with some examples."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Training a preset\n",
+    "In this example, we'll create a very simple graph containing a Clipped PPO agent running with the CartPole-v0 Gym environment. `CoachInterface` has a few useful parameters such as `custom_parameter` that enables overriding preset settings, and other optional parameters enabling control over the training process. We'll override the preset's schedule parameters, train with a single rollout worker, and save checkpoints every 10 seconds:"
   ]
  },
  {
@@ -84,17 +116,11 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from rl_coach.coach import CoachInterface\n",
-    "\n",
    "coach = CoachInterface(preset='CartPole_ClippedPPO',\n",
-    "                       custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Running the graph according to the given schedule:"
+    "                       # The optional custom_parameter enables overriding preset settings\n",
+    "                       custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)',\n",
+    "                       # Other optional parameters enable easy access to advanced functionalities\n",
+    "                       num_workers=1, checkpoint_save_secs=10)"
   ]
  },
  {
@@ -110,7 +136,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Running each phase manually"
+    "#### Running each training or inference iteration manually"
   ]
  },
  {
@@ -126,70 +152,37 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from rl_coach.core_types import EnvironmentSteps\n",
-    "\n",
-    "coach.graph_manager.heatup(EnvironmentSteps(100))\n",
-    "for _ in range(10):\n",
-    "    coach.graph_manager.train_and_act(EnvironmentSteps(50))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Additional functionality"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "`CoachInterface` allows for easy access to functionalities such as multi-threading and saving checkpoints:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "coach = CoachInterface(preset='CartPole_ClippedPPO', num_workers=2, checkpoint_save_secs=10)\n",
-    "coach.run()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Agent functionality"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When using `CoachInterface` (single agent with one level of hierarchy) it's also possible to easily use the `Agent` object functionality, such as logging and reading signals and applying the policy the agent has learned on a given state:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from rl_coach.environments.gym_environment import GymEnvironment,  GymVectorEnvironment\n",
+    "from rl_coach.environments.gym_environment import GymEnvironment, GymVectorEnvironment\n",
    "from rl_coach.base_parameters import VisualizationParameters\n",
    "from rl_coach.core_types import EnvironmentSteps\n",
    "\n",
    "coach = CoachInterface(preset='CartPole_ClippedPPO')\n",
    "\n",
+    "# registering an iteration signal before starting to run\n",
+    "coach.graph_manager.log_signal('iteration', -1)\n",
+    "\n",
+    "coach.graph_manager.heatup(EnvironmentSteps(100))\n",
+    "\n",
    "# training\n",
    "for it in range(10):\n",
+    "    # logging the iteration signal during training\n",
    "    coach.graph_manager.log_signal('iteration', it)\n",
+    "    # using the graph manager to train and act a given number of steps\n",
    "    coach.graph_manager.train_and_act(EnvironmentSteps(100))\n",
+    "    # reading signals during training\n",
    "    training_reward = coach.graph_manager.get_signal_value('Training Reward')"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sometimes we may want to track the agent's decisions, log or maybe even modify them.\n",
+    "We can access the agent itself through the `CoachInterface` as follows. \n",
+    "\n",
+    "Note that we also need an instance of the environment to do so. In this case we use instantiate a `GymEnvironment` object with the CartPole `GymVectorEnvironment`:"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -200,29 +193,41 @@
    "env_params = GymVectorEnvironment(level='CartPole-v0')\n",
    "env = GymEnvironment(**env_params.__dict__, visualization_parameters=VisualizationParameters())\n",
    "\n",
-    "for it in range(10):\n",
-    "    action_info = coach.graph_manager.get_agent().choose_action(env.state)\n",
-    "    print(\"State:{}, Action:{}\".format(env.state,action_info.action))\n",
-    "    env.step(action_info.action)"
+    "response = env.reset_internal_state()\n",
+    "for _ in range(10):\n",
+    "    action_info = coach.graph_manager.get_agent().choose_action(response.next_state)\n",
+    "    print(\"State:{}, Action:{}\".format(response.next_state,action_info.action))\n",
+    "    response = env.step(action_info.action)\n",
+    "    print(\"Reward:{}\".format(response.reward))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Using GraphManager Directly"
+    "### Non-preset - using `GraphManager` directly"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file.\n",
+    "It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Training an agent with a custom Gym environment\n",
    "\n",
-    "Here we show an example of how to do so with a custom environment.\n",
-    "We can use a custom gym environment without registering it. \n",
-    "We just need the path to the environment module.\n",
-    "We can also pass custom parameters for the environment `__init__` function as `additional_simulator_parameters`."
+    "Here we show an example of how to use the `GraphManager` to train an agent on a custom Gym environment.\n",
+    "\n",
+    "We first construct a `GymEnvironmentParameters` object describing the environment parameters. For Gym environments with vector observations, we can use the more specific `GymVectorEnvironment` object. \n",
+    "\n",
+    "The path to the custom environment is defined in the `level` parameter and it can be the absolute path to its class (e.g. `'/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass'`) or the relative path to the module as in this example. In any case, we can use the custom gym environment without registering it.\n",
+    "\n",
+    "Custom parameters for the environment's `__init__` function can be passed as `additional_simulator_parameters`."
   ]
  },
  {
@@ -269,23 +274,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The path to the environment can also be set as an absolute path, as follows: `<absolute python module path>:<environment class>`. For example:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "env_params = GymVectorEnvironment(level='/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
+    "#### Advanced functionality - proprietary exploration policy, checkpoint evaluation"
   ]
  },
  {
@@ -416,6 +405,13 @@
    "# Clearning up\n",
    "shutil.rmtree(my_checkpoint_dir)"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {