mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 19:20:19 +01:00
Coach as a library (#348)
* CoachInterface + tutorial * Some improvements and typo fixes * merge tutorial 0 and 4 * typo fix + additional tutorial changes * tutorial changes * added reading signals and experiment path argument
This commit is contained in:
@@ -54,6 +54,8 @@
|
||||
"source": [
|
||||
"Alternatively, Coach can be used a library directly from python. As described above, Coach uses the presets mechanism to define the experiments. A preset is essentially a python module which instantiates a `GraphManager` object. The graph manager is a container that holds the agents and the environments, and has some additional parameters for running the experiment, such as visualization parameters. The graph manager acts as the scheduler which orchestrates the experiment.\n",
|
||||
"\n",
|
||||
"Running Coach directly from python is done through a `CoachInterface` object, which uses the same arguments as the command line invocation but allowes for more flexibility and additional control of the training/inference process.\n",
|
||||
"\n",
|
||||
"Let's start with some examples.\n",
|
||||
"\n",
|
||||
"Creating a very simple graph containing a single Clipped PPO agent running with the CartPole-v0 Gym environment:"
|
||||
@@ -65,16 +67,24 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters\n",
|
||||
"from rl_coach.environments.gym_environment import GymVectorEnvironment\n",
|
||||
"from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager\n",
|
||||
"from rl_coach.graph_managers.graph_manager import SimpleSchedule\n",
|
||||
"# Adding module path to sys path if not there, so rl_coach submodules can be imported\n",
|
||||
"import os\n",
|
||||
"import sys\n",
|
||||
"module_path = os.path.abspath(os.path.join('..'))\n",
|
||||
"if module_path not in sys.path:\n",
|
||||
" sys.path.append(module_path)\n",
|
||||
"\n",
|
||||
"graph_manager = BasicRLGraphManager(\n",
|
||||
" agent_params=ClippedPPOAgentParameters(),\n",
|
||||
" env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
|
||||
" schedule_params=SimpleSchedule()\n",
|
||||
")"
|
||||
"from rl_coach.coach import CoachInterface"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"coach = CoachInterface(preset='CartPole_ClippedPPO',\n",
|
||||
" custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -90,7 +100,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph_manager.improve()"
|
||||
"coach.run()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -104,7 +114,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The graph manager simplifies the scheduling process by encapsulating the calls to each of the training phases. Sometimes, it can be beneficial to have a more fine grained control over the scheduling process. This can be easily done by calling the individual phase functions directly:"
|
||||
"The graph manager (which was instantiated in the preset) can be accessed from the `CoachInterface` object. The graph manager simplifies the scheduling process by encapsulating the calls to each of the training phases. Sometimes, it can be beneficial to have a more fine grained control over the scheduling process. This can be easily done by calling the individual phase functions directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -115,23 +125,18 @@
|
||||
"source": [
|
||||
"from rl_coach.core_types import EnvironmentSteps\n",
|
||||
"\n",
|
||||
"graph_manager = BasicRLGraphManager(\n",
|
||||
" agent_params=ClippedPPOAgentParameters(),\n",
|
||||
" env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
|
||||
" schedule_params=SimpleSchedule()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"graph_manager.heatup(EnvironmentSteps(100))\n",
|
||||
"graph_manager.train_and_act(EnvironmentSteps(100))"
|
||||
"coach.graph_manager.heatup(EnvironmentSteps(100))\n",
|
||||
"for _ in range(10):\n",
|
||||
" coach.graph_manager.train_and_act(EnvironmentSteps(50))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Changing the default parameters\n",
|
||||
"### Additional functionality\n",
|
||||
"\n",
|
||||
"Agents in Coach are defined along with some default parameters that follow the published paper definition. This may be sufficient when running the exact same experiments as in the paper, but otherwise, there would probably need to be some changes made to the algorithm parameters. Again, this is easily modifiable, and all the internal parameters are accessible from within the preset:"
|
||||
"`CoachInterface` allows for easy access to functionalities such as multi-threading and saving checkpoints:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -140,39 +145,63 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters\n",
|
||||
"from rl_coach.environments.gym_environment import GymVectorEnvironment\n",
|
||||
"from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager\n",
|
||||
"from rl_coach.graph_managers.graph_manager import SimpleSchedule\n",
|
||||
"from rl_coach.graph_managers.graph_manager import ScheduleParameters\n",
|
||||
"from rl_coach.core_types import TrainingSteps, EnvironmentEpisodes, EnvironmentSteps\n",
|
||||
"\n",
|
||||
"# schedule\n",
|
||||
"schedule_params = ScheduleParameters()\n",
|
||||
"schedule_params.improve_steps = TrainingSteps(10000000)\n",
|
||||
"schedule_params.steps_between_evaluation_periods = EnvironmentSteps(2048)\n",
|
||||
"schedule_params.evaluation_steps = EnvironmentEpisodes(5)\n",
|
||||
"schedule_params.heatup_steps = EnvironmentSteps(0)\n",
|
||||
"\n",
|
||||
"# agent parameters\n",
|
||||
"agent_params = ClippedPPOAgentParameters()\n",
|
||||
"agent_params.algorithm.discount = 1.0\n",
|
||||
"\n",
|
||||
"graph_manager = BasicRLGraphManager(\n",
|
||||
" agent_params=agent_params,\n",
|
||||
" env_params=GymVectorEnvironment(level='CartPole-v0'),\n",
|
||||
" schedule_params=schedule_params\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"graph_manager.improve()\n"
|
||||
"coach = CoachInterface(preset='CartPole_ClippedPPO', num_workers=2, checkpoint_save_secs=10)\n",
|
||||
"coach.run()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using a custom gym environment\n",
|
||||
"### Agent functionality\n",
|
||||
"\n",
|
||||
"When using `CoachInterface` (single agent with one level of hierarchy) it's also possible to easily use the `Agent` object functionality, such as logging and reading signals and applying the policy the agent has learned on a given state:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from rl_coach.environments.gym_environment import GymEnvironment, GymVectorEnvironment\n",
|
||||
"from rl_coach.base_parameters import VisualizationParameters\n",
|
||||
"from rl_coach.core_types import EnvironmentSteps\n",
|
||||
"\n",
|
||||
"coach = CoachInterface(preset='CartPole_ClippedPPO')\n",
|
||||
"\n",
|
||||
"# training\n",
|
||||
"for it in range(10):\n",
|
||||
" coach.graph_manager.log_signal('iteration', it)\n",
|
||||
" coach.graph_manager.train_and_act(EnvironmentSteps(100))\n",
|
||||
" training_reward = coach.graph_manager.get_signal_value('Training Reward')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# inference\n",
|
||||
"env_params = GymVectorEnvironment(level='CartPole-v0')\n",
|
||||
"env = GymEnvironment(**env_params.__dict__, visualization_parameters=VisualizationParameters())\n",
|
||||
"\n",
|
||||
"for it in range(10):\n",
|
||||
" action_info = coach.graph_manager.get_agent().choose_action(env.state)\n",
|
||||
" print(\"State:{}, Action:{}\".format(env.state,action_info.action))\n",
|
||||
" env.step(action_info.action)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using GraphManager Directly\n",
|
||||
"\n",
|
||||
"It is also possible to invoke coach directly in the python code without defining a preset (which is necessary for `CoachInterface`) by using the `GraphManager` object directly. Using Coach this way won't allow you access functionalities such as multi-threading, but it might be convenient if you don't want to define a preset file.\n",
|
||||
"\n",
|
||||
"Here we show an example of how to do so with a custom environment.\n",
|
||||
"We can use a custom gym environment without registering it. \n",
|
||||
"We just need the path to the environment module.\n",
|
||||
"We can also pass custom parameters for the environment `__init__` function as `additional_simulator_parameters`."
|
||||
@@ -244,7 +273,7 @@
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
"version": 3.0
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
@@ -255,5 +284,5 @@
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
Reference in New Issue
Block a user