+
+ +
+

Batch Reinforcement Learning¶

+

Coach supports Batch Reinforcement Learning, where learning is based solely on a (fixed) batch of data. +In Batch RL, we are given a dataset of experience, which was collected using some (one or more) deployed policies, and we would +like to use it to learn a better policy than what was used to collect the dataset. +There is no simulator to interact with, and so we cannot collect any new data, meaning we often cannot explore the MDP any further. +To make things even harder, we would also like to use the dataset in order to evaluate the newly learned policy +(using off-policy evaluation), since we do not have a simulator which we can use to evaluate the policy on. +Batch RL is also often beneficial in cases where we just want to separate the inference (data collection) from the +training process of a new policy. This is often the case where we have a system on which we could quite easily deploy a policy +and collect experience data, but cannot easily use that system’s setup to online train a new policy (as is often the +case with more standard RL algorithms).

+

Coach supports (almost) all of the integrated off-policy algorithms with Batch RL.

+

A lot more details and example usage can be found in the +tutorial.

+
+ + +
+ +