coach/tutorials/0. Quick Start Guide.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started Guide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using Coach from the Command Line"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When running Coach from the command line, we use a Preset module to define the experiment parameters.\n",
    "As its name implies, a preset is a predefined set of parameters to run some agent on some environment.\n",
    "Coach has many predefined presets that follow the algorithms definitions in the published papers, and allows training some of the existing algorithms with essentially no coding at all. This presets can easily be run from the command line. For example:\n",
    "\n",
    "`coach -p CartPole_DQN`\n",
    "\n",
    "You can find all the predefined presets under the `presets` directory, or by listing them using the following command:\n",
    "\n",
    "`coach -l`\n",
    "\n",
    "Coach can also be used with an externally defined preset by passing the absolute path to the module and the name of the graph manager object which is defined in the preset: \n",
    "\n",
    "`coach -p /home/my_user/my_agent_dir/my_preset.py:graph_manager`\n",
    "\n",
    "Some presets are generic for multiple environment levels, and therefore require defining the specific level through the command line:\n",
    "\n",
    "`coach -p Atari_DQN -lvl breakout`\n",
    "\n",
    "There are plenty of other command line arguments you can use in order to customize the experiment. A full documentation of the available arguments can be found using the following command:\n",
    "\n",
    "`coach -h`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using Coach as a Library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
    "\n",
    "Let's start with some examples.\n",
    "\n",
    "Creating a very simple graph containing a single Clipped PPO agent running with the CartPole-v0 Gym environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters\n",
    "from rl_coach.environments.gym_environment import GymVectorEnvironment\n",
    "from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager\n",
    "from rl_coach.graph_managers.graph_manager import SimpleSchedule\n",
    "\n",
    "graph_manager = BasicRLGraphManager(\n",
    "    agent_params=ClippedPPOAgentParameters(),\n",
    "    env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
    "    schedule_params=SimpleSchedule()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the graph according to the given schedule:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph_manager.improve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running each phase manually"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The graph manager simplifies the scheduling process by encapsulating the calls to each of the training phases. Sometimes, it can be beneficial to have a more fine grained control over the scheduling process. This can be easily done by calling the individual phase functions directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rl_coach.core_types import EnvironmentSteps\n",
    "\n",
    "graph_manager = BasicRLGraphManager(\n",
    "    agent_params=ClippedPPOAgentParameters(),\n",
    "    env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
    "    schedule_params=SimpleSchedule()\n",
    ")\n",
    "\n",
    "graph_manager.heatup(EnvironmentSteps(100))\n",
    "graph_manager.train_and_act(EnvironmentSteps(100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Changing the default parameters\n",
    "\n",
    "Agents in Coach are defined along with some default parameters that follow the published paper definition. This may be sufficient when running the exact same experiments as in the paper, but otherwise, there would probably need to be some changes made to the algorithm parameters. Again, this is easily modifiable, and all the internal parameters are accessible from within the preset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters\n",
    "from rl_coach.environments.gym_environment import GymVectorEnvironment\n",
    "from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager\n",
    "from rl_coach.graph_managers.graph_manager import SimpleSchedule\n",
    "from rl_coach.graph_managers.graph_manager import ScheduleParameters\n",
    "from rl_coach.core_types import TrainingSteps, EnvironmentEpisodes, EnvironmentSteps\n",
    "\n",
    "# schedule\n",
    "schedule_params = ScheduleParameters()\n",
    "schedule_params.improve_steps = TrainingSteps(10000000)\n",
    "schedule_params.steps_between_evaluation_periods = EnvironmentSteps(2048)\n",
    "schedule_params.evaluation_steps = EnvironmentEpisodes(5)\n",
    "schedule_params.heatup_steps = EnvironmentSteps(0)\n",
    "\n",
    "# agent parameters\n",
    "agent_params = ClippedPPOAgentParameters()\n",
    "agent_params.algorithm.discount = 1.0\n",
    "\n",
    "graph_manager = BasicRLGraphManager(\n",
    "    agent_params=agent_params,\n",
    "    env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
    "    schedule_params=schedule_params\n",
    ")\n",
    "\n",
    "graph_manager.improve()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using a custom gym environment\n",
    "\n",
    "We can use a custom gym environment without registering it. \n",
    "We just need the path to the environment module.\n",
    "We can also pass custom parameters for the environment `__init__` function as `additional_simulator_parameters`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters\n",
    "from rl_coach.environments.gym_environment import GymVectorEnvironment\n",
    "from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager\n",
    "from rl_coach.graph_managers.graph_manager import SimpleSchedule\n",
    "from rl_coach.architectures.embedder_parameters import InputEmbedderParameters\n",
    "\n",
    "# define the environment parameters\n",
    "bit_length = 10\n",
    "env_params = GymVectorEnvironment(level='rl_coach.environments.toy_problems.bit_flip:BitFlip')\n",
    "env_params.additional_simulator_parameters = {'bit_length': bit_length, 'mean_zero': True}\n",
    "\n",
    "# Clipped PPO\n",
    "agent_params = ClippedPPOAgentParameters()\n",
    "agent_params.network_wrappers['main'].input_embedders_parameters = {\n",
    "    'state': InputEmbedderParameters(scheme=[]),\n",
    "    'desired_goal': InputEmbedderParameters(scheme=[])\n",
    "}\n",
    "\n",
    "graph_manager = BasicRLGraphManager(\n",
    "    agent_params=agent_params,\n",
    "    env_params=env_params,\n",
    "    schedule_params=SimpleSchedule()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph_manager.improve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The path to the environment can also be set as an absolute path, as follows: `<absolute python module path>:<environment class>`. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "env_params = GymVectorEnvironment(level='/home/user/my_environment_dir/my_environment_module.py:MyEnvironmentClass')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}