1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 19:20:19 +01:00
Files
coach/docs_raw/source/design/horizontal_scaling.rst
Balaji Subramaniam d06197f663 Add documentation on distributed Coach. (#158)
* Added documentation on distributed Coach.
2018-11-27 12:26:15 +02:00

40 lines
2.2 KiB
ReStructuredText

.. _dist-coach-design:
Distributed Coach - Horizontal Scale-Out
========================================
Coach supports the horizontal scale-out of rollout workers using `--distributed_coach` or `-dc` options. Coach uses
three interfaces for horizontal scale-out, which allows for integration with different technologies and flexibility.
These three interfaces are orchestrator, memory backend and data store.
* **Orchestrator** - The orchestrator interface provides basic interaction points for orchestration, scheduling and
resource management of training and rollout workers in the distributed coach mode. The interactions points define
how Coach should deploy, undeploy and monitor the workers spawned by Coach.
* **Memory Backend** - This interface is used as the backing store or stream for the memory abstraction in
distributed Coach. The implementation of this module is mainly used for communicating experiences (transitions
and episodes) from the rollout to the training worker.
* **Data Store** - This interface is used as a backing store for the policy checkpoints. It is mainly used to
synchronizing policy checkpoints from the training to the rollout worker.
.. image:: /_static/img/horizontal-scale-out.png
:width: 800px
:align: center
Supported Synchronization Types
-------------------------------
Synchronization type refers to the mechanism by which the policy checkpoints are synchronized from the training to the
rollout worker. For each algorithm, it is specified by using the `DistributedCoachSynchronizationType` as a part of
`agent_params.algorithm.distributed_coach_synchronization_type` in the preset. In distributed Coach, two types of
synchronization modes are supported: `SYNC` and `ASYNC`.
* **SYNC** - In this type, the trainer waits for all the experiences to be gathered from distributed rollout workers
before training a new policy and the rollout workers wait for a new policy before gathering experiences. It is suitable
for ON policy algorithms.
* **ASYNC** - In this type, the trainer doesn't wait for any set of experiences to be gathered from distributed
rollout workers and the rollout workers continously gather experiences loading new policies, whenever they become
available. It is suitable for OFF policy algorithms.