1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00

Add documentation on distributed Coach. (#158)

* Added documentation on distributed Coach.
This commit is contained in:
Balaji Subramaniam
2018-11-27 02:26:15 -08:00
committed by Gal Novik
parent e3ecf445e2
commit d06197f663
151 changed files with 5302 additions and 643 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

View File

@@ -0,0 +1,10 @@
Data Stores
===========
S3DataStore
-----------
.. autoclass:: rl_coach.data_stores.s3_data_store.S3DataStore
NFSDataStore
------------
.. autoclass:: rl_coach.data_stores.nfs_data_store.NFSDataStore

View File

@@ -0,0 +1,6 @@
Memory Backends
===============
RedisPubSubBackend
------------------
.. autoclass:: rl_coach.memories.backend.redis.RedisPubSubBackend

View File

@@ -0,0 +1,7 @@
Orchestrators
=============
Kubernetes
----------
.. autoclass:: rl_coach.orchestrators.kubernetes_orchestrator.Kubernetes

View File

@@ -1,148 +1,39 @@
# Scaling out rollout workers
.. _dist-coach-design:
This document contains some options for how we could implement horizontal scaling of rollout workers in coach, though most details are not specific to coach. A few options are laid out, my current suggestion would be to start with Option 1, and move on to Option 1a or Option 1b as required.
Distributed Coach - Horizontal Scale-Out
========================================
## Off Policy Algorithms
Coach supports the horizontal scale-out of rollout workers using `--distributed_coach` or `-dc` options. Coach uses
three interfaces for horizontal scale-out, which allows for integration with different technologies and flexibility.
These three interfaces are orchestrator, memory backend and data store.
### Option 1 - master polls file system
* **Orchestrator** - The orchestrator interface provides basic interaction points for orchestration, scheduling and
resource management of training and rollout workers in the distributed coach mode. The interactions points define
how Coach should deploy, undeploy and monitor the workers spawned by Coach.
- one master process samples memories and updates the policy
- many worker processes execute rollouts
- coordinate using a single shared networked file system: nfs, ceph, dat, s3fs, etc.
- policy sync communication method:
- master process occasionally writes policy to shared file system
- worker processes occasionally read policy from shared file system
- prevent workers from reading a policy which has not been completely written to disk using either:
- redis lock
- write to temporary files and then rename
- rollout memories:
- sync communication method:
- worker processes write rollout memories as they are generated to shared filesystem
- master process occasionally reads rollout memories from shared file system
- master process must be resilient to corrupted or incompletely written memories
- sampling method:
- master process keeps all rollouts in memory utilizing existing coach memory classes
- control flow:
- master:
- run training updates interleaved with loading of any newly available rollouts in memory
- periodically write policy to disk
- workers:
- periodically read policy from disk
- evaluate rollouts and write them to disk
- ops:
- kubernetes yaml, kml, docker compose, etc
- a default shared file system can be provided, while allowing the user to specify something else if desired
- a default method of launching the workers and master (in kubernetes, gce, aws, etc) can be provided
* **Memory Backend** - This interface is used as the backing store or stream for the memory abstraction in
distributed Coach. The implementation of this module is mainly used for communicating experiences (transitions
and episodes) from the rollout to the training worker.
#### Pros
* **Data Store** - This interface is used as a backing store for the policy checkpoints. It is mainly used to
synchronizing policy checkpoints from the training to the rollout worker.
- very simple to implement, infrastructure already available in ai-lab-kubernetes
- fast enough for proof of concept and iteration of interface design
- rollout memories are durable and can be easily reused in later off policy training
- if designed properly, there is a clear path towards:
- decreasing latency using in-memory store (option 1a/b)
- increasing rollout memory size using distributed sampling methods (option 1c)
.. image:: /_static/img/horizontal-scale-out.png
:width: 800px
:align: center
#### Cons
Supported Synchronization Types
-------------------------------
- file system interface incurs additional latency. rollout memories must be written to disk, and later read from disk, instead of going directly from memory to memory.
- will require modifying standard control flow. there will be an impact on algorithms which expect particular training regimens. Specifically, algorithms which are sensitive to the number of update steps between target/online network updates
- will not be particularly efficient in strictly on policy algorithms where each rollout must use the most recent policy available
Synchronization type refers to the mechanism by which the policy checkpoints are synchronized from the training to the
rollout worker. For each algorithm, it is specified by using the `DistributedCoachSynchronizationType` as a part of
`agent_params.algorithm.distributed_coach_synchronization_type` in the preset. In distributed Coach, two types of
synchronization modes are supported: `SYNC` and `ASYNC`.
### Option 1a - master polls (redis) list
* **SYNC** - In this type, the trainer waits for all the experiences to be gathered from distributed rollout workers
before training a new policy and the rollout workers wait for a new policy before gathering experiences. It is suitable
for ON policy algorithms.
- instead of using a file system as in Option 1, redis lists can be used
- policy is stored as a single key/value pair (locking no longer necessary)
- rollout memory communication:
- workers: redis list push
- master: redis list len, redis list range
- note: many databases are interchangeable with redis protocol: google memorystore, aws elasticache, etc.
- note: many databases can implement this interface with minimal glue: SQL, any objectstore, etc.
#### Pros
- lower latency than disk since it is all in memory
- clear path toward scaling to large number of workers
- no concern about reading partially written rollouts
- no synchronization or additional threads necessary, though an additional thread would be helpful for concurrent reads from redis and training
- will be slightly more efficient in the case of strictly on policy algorithms
#### Cons
- more complex to set up, especially if you are concerned about rollout memory durability
### Option 1b - master subscribes to (redis) pub sub
- instead of using a file system as in Option 1, redis pub sub can be used
- policy is stored as a single key/value pair (locking no longer necessary)
- rollout memory communication:
- workers: redis publish
- master: redis subscribe
- no synchronization necessary, however an additional thread would be necessary?
- it looks like the python client might handle this already, would need further investigation
- note: many possible pub sub systems could be used with different characteristics under specific contexts: kafka, google pub/sub, aws kinesis, etc
#### Pros
- lower latency than disk since it is all in memory
- clear path toward scaling to large number of workers
- no concern about reading partially written rollouts
- will be slightly more efficient in the case of strictly on policy algorithms
#### Cons
- more complex to set up then shared file system
- on its own, does not persist worker rollouts for future off policy training
### Option 1c - distributed rollout memory sampling
- if rollout memories do not fit in memory of a single machine, a distributed storage and sampling method would be necessary
- for example:
- rollout memory store: redis set add
- rollout memory sample: redis set randmember
#### Pros
- capable of taking advantage of rollout memory larger than the available memory of a single machine
- reduce resource constraints on training machine
#### Cons
- distributed versions of each memory type/sampling method need to be custom built
- off-the-shelf implementations may not be available for complex memory types/sampling methods
### Option 2 - master listens to workers
- rollout memories:
- workers send memories directly to master via: mpi, 0mq, etc
- master policy thread listens for new memories and stores them in shared memory
- policy sync communication memory:
- master policy occasionally sends policies directly to workers via: mpi, 0mq, etc
- master and workers must synchronize so that all workers are listening when the master is ready to send a new policy
#### Pros
- lower latency than option 1 (for a small number of workers)
- will potentially be the optimal choice in the case of strictly on policy algorithms with relatively small number of worker nodes (small enough that more complex communication typologies would be necessary: rings, p2p, etc)
#### Cons
- much less robust and more difficult to debug requiring lots of synchronization
- much more difficult to be resiliency worker failure
- more custom communication/synchronization code
- as the number of workers scale up, a larger and larger fraction of time will be spent waiting and synchronizing
### Option 3 - Ray
#### Pros
- Ray would allow us to easily convert our current algorithms to distributed versions, with minimal change to our code.
#### Cons
- performance from naïve/simple use would be very similar to Option 2
- nontrivial to replace with a higher performance system if desired. Additional performance will require significant code changes.
## On Policy Algorithms
TODO
* **ASYNC** - In this type, the trainer doesn't wait for any set of experiences to be gathered from distributed
rollout workers and the rollout workers continously gather experiences loading new policies, whenever they become
available. It is suitable for OFF policy algorithms.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,239 @@
.. _dist-coach-usage:
Usage - Distributed Coach
=========================
Coach supports the horizontal scale-out of rollout workers in distributed mode. For more information on the design and
implementation of distributed Coach, see :ref:`dist-coach-design`. In the rest of this section, we will describe how to
get started with distributed Coach.
Interfaces and Implementations
------------------------------
Coach uses three interfaces to orchestrate, schedule and manager the resources of workers it spawns in the distributed
mode. These interfaces are the orchestrator, memory backend and the data store. Refer to :ref:`dist-coach-design` for
more information. The following implementation(s) are available for each interface:
* **Orchestrator** - `Kubernetes <https://kubernetes.io>`_.
* **Memory Backend** - `Redis Pub/Sub <https://redis.io/topics/pubsub>`_.
* **Data Store** - `S3 <https://aws.amazon.com/s3>`_ and `NFS <https://en.wikipedia.org/wiki/Network_File_System>`_.
Prerequisites
-------------
* Building and pushing containers - `Docker <https://docs.docker.com/install/linux/docker-ce/ubuntu>`_.
* Container registry access for hosting container images - `Docker Hub <https://hub.docker.com>`_
* Using Kubernetes for orchestration - `Kubernetes configuration <https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/>`_.
* Using S3 for storing policy checkpoints - `AWS CLI <https://docs.aws.amazon.com/cli/latest/userguide/installing.html>_,
`AWS credentials <https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks>`_
and `S3 bucket <https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html>`_.
Clone the Repository
--------------------
.. code-block:: bash
$ git clone git@github.com:NervanaSystems/coach.git
$ cd coach
Build Container Image and Push
------------------------------
Create a directory `docker`.
.. code-block:: bash
$ mkdir docker
Create docker files in the `docker` directory.
A sample base docker file (Dockerfile.base) would look like this:
.. code-block:: bash
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
################################
# Install apt-get Requirements #
################################
# General
RUN apt-get update && \
apt-get install -y python3-pip cmake zlib1g-dev python3-tk python-opencv \
# Boost libraries
libboost-all-dev \
# Scipy requirements
libblas-dev liblapack-dev libatlas-base-dev gfortran \
# Pygame requirements
libsdl-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev \
libsmpeg-dev libportmidi-dev libavformat-dev libswscale-dev \
# Dashboard
dpkg-dev build-essential python3.5-dev libjpeg-dev libtiff-dev libsdl1.2-dev libnotify-dev \
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev \
libwebkitgtk-3.0-dev libgstreamer-plugins-base1.0-dev \
# Gym
libav-tools libsdl2-dev swig cmake \
# Mujoco_py
curl libgl1-mesa-dev libgl1-mesa-glx libglew-dev libosmesa6-dev software-properties-common \
# ViZDoom
build-essential zlib1g-dev libsdl2-dev libjpeg-dev \
nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \
libopenal-dev timidity libwildmidi-dev unzip wget && \
apt-get clean autoclean && \
apt-get autoremove -y
############################
# Install Pip Requirements #
############################
RUN pip3 install --upgrade pip
RUN pip3 install setuptools==39.1.0 && pip3 install pytest && pip3 install pytest-xdist
RUN curl -o /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf \
&& chmod +x /usr/local/bin/patchelf
A sample docker file for the gym environment would look like this:
.. code-block:: bash
FROM coach-base:master as builder
# prep gym and any of its related requirements.
RUN pip3 install gym[atari,box2d,classic_control]==0.10.5
# add coach source starting with files that could trigger
# re-build if dependencies change.
RUN mkdir /root/src
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
RUN pip3 install -r /root/src/requirements.txt
FROM coach-base:master
WORKDIR /root/src
COPY --from=builder /root/.cache /root/.cache
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
COPY README.md /root/src/.
RUN pip3 install gym[atari,box2d,classic_control]==0.10.5 && pip3 install -e .[all] && rm -rf /root/.cache
COPY . /root/src
A sample docker file for the Mujoco environment would look like this:
.. code-block:: bash
FROM coach-base:master as builder
# prep mujoco and any of its related requirements.
# Mujoco
RUN mkdir -p ~/.mujoco \
&& wget https://www.roboti.us/download/mjpro150_linux.zip -O mujoco.zip \
&& unzip -n mujoco.zip -d ~/.mujoco \
&& rm mujoco.zip
ARG MUJOCO_KEY
ENV MUJOCO_KEY=$MUJOCO_KEY
ENV LD_LIBRARY_PATH /root/.mujoco/mjpro150/bin:$LD_LIBRARY_PATH
RUN echo $MUJOCO_KEY | base64 --decode > /root/.mujoco/mjkey.txt
RUN pip3 install mujoco_py
# add coach source starting with files that could trigger
# re-build if dependencies change.
RUN mkdir /root/src
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
RUN pip3 install -r /root/src/requirements.txt
FROM coach-base:master
WORKDIR /root/src
COPY --from=builder /root/.mujoco /root/.mujoco
ENV LD_LIBRARY_PATH /root/.mujoco/mjpro150/bin:$LD_LIBRARY_PATH
COPY --from=builder /root/.cache /root/.cache
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
COPY README.md /root/src/.
RUN pip3 install mujoco_py && pip3 install -e .[all] && rm -rf /root/.cache
COPY . /root/src
A sample docker file for the ViZDoom environment would look like this:
.. code-block:: bash
FROM coach-base:master as builder
# prep vizdoom and any of its related requirements.
RUN pip3 install vizdoom
# add coach source starting with files that could trigger
# re-build if dependencies change.
RUN mkdir /root/src
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
RUN pip3 install -r /root/src/requirements.txt
FROM coach-base:master
WORKDIR /root/src
COPY --from=builder /root/.cache /root/.cache
COPY setup.py /root/src/.
COPY requirements.txt /root/src/.
COPY README.md /root/src/.
RUN pip3 install vizdoom && pip3 install -e .[all] && rm -rf /root/.cache
COPY . /root/src
Build the base container. Make sure you are in the Coach root directory before building.
.. code-block:: bash
$ docker build -t coach-base:master -f docker/Dockerfile.base .
If you would like to use the Mujoco environment, save this key as an environment variable. Replace `<mujoco_key>` with the
contents of your mujoco key.
.. code-block:: bash
$ export MUJOCO_KEY=<mujoco_key>
Build the container for your environment.
Replace `<env>` with your choice of environment. The choices are `gym`, `mujoco` and `doom`.
Replace `<user-name>`, `<image-name>` and `<tag>` with appropriate values.
.. code-block:: bash
$ docker build --build-arg MUJOCO_KEY=${MUJOCO_KEY} -t <user-name>/<image-name>:<tag> -f docker/Dockerfile.<env> .
Push the container to a registry of your choice. Replace `<user-name>`, `<image-name>` and `<tag>` with appropriate values.
.. code-block:: bash
$ docker push <user-name>/<image-name>:<tag>
Create a Config file
--------------------
Add the following contents to file.
Replace `<user-name>`, `<image-name>`, `<tag>`, `<bucket-name>` and `<path-to-aws-credentials>` with appropriate values.
.. code-block:: bash
[coach]
image = <user-name>/<image-name>:<tag>
memory_backend = redispubsub
data_store = s3
s3_end_point = s3.amazonaws.com
s3_bucket_name = <bucket-name>
s3_creds_file = <path-to-aws-credentials>
Run Distributed Coach
---------------------
The following command will run distributed Coach with CartPole_ClippedPPO preset, Redis Pub/Sub as the memory backend, S3 as the data store in Kubernetes
with three rollout workers.
.. code-block:: bash
$ python3 rl_coach/coach.py -p CartPole_ClippedPPO \
-dc \
-e <experiment-name> \
-n 3 \
-dcp <path-to-config-file>

View File

@@ -36,6 +36,7 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
:titlesonly:
usage
dist_usage
features/index
selecting_an_algorithm
dashboard
@@ -47,6 +48,7 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
design/control_flow
design/network
design/horizontal_scaling
.. toctree::
:maxdepth: 1
@@ -61,10 +63,13 @@ You can find more details in the `GitHub repository <https://github.com/NervanaS
components/agents/index
components/architectures/index
components/data_stores/index
components/environments/index
components/exploration_policies/index
components/filters/index
components/memories/index
components/memory_backends/index
components/orchestrators/index
components/core_types
components/spaces
components/additional_parameters

View File

@@ -1,7 +1,7 @@
Usage
=====
One of the mechanism Coach uses for running experiments is the **Preset** mechanism.
One of the mechanisms Coach uses for running experiments is the **Preset** mechanism.
As its name implies, a preset defines a set of predefined experiment parameters.
This allows defining a *complex* agent-environment interaction, with multiple parameters, and later running it through
a very *simple* command line.
@@ -29,7 +29,7 @@ To list the available presets, use the `-l` flag.
Multi-threaded Algorithms
+++++++++++++++++++++++++
Multi-threaded algorithms are very common this days.
Multi-threaded algorithms are very common these days.
They typically achieve the best results, and scale gracefully with the number of threads.
In Coach, running such algorithms is done by selecting a suitable preset, and choosing the number of threads to run using the :code:`-n` flag.
@@ -39,6 +39,20 @@ In Coach, running such algorithms is done by selecting a suitable preset, and ch
coach -p CartPole_A3C -n 8
Multi-Node Algorithms
+++++++++++++++++++++++++
Coach supports the multi-node runs in distributed mode. Specifically, the horizontal scale-out of rollout workers is implemented.
In Coach, running such algorithms is done by selecting a suitable preset, enabling distributed coach using :code:`-dc` flag,
passing distributed coach parameters using :code:`dcp` and choosing the number of to run using the :code:`-n` flag.
For more details and instructions on how to use distributed Coach, see :ref:`dist-coach-usage`.
*Example:*
.. code-block:: python
coach -p CartPole_ClippedPPO -dc -dcp <path-to-config-file> -n 8
Evaluating an Agent
-------------------
@@ -155,4 +169,4 @@ The most up to date description can be found by using the :code:`-h` flag.
.. argparse::
:module: rl_coach.coach
:func: create_argument_parser
:prog: coach
:prog: coach