mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 19:20:19 +01:00
Batch RL Tutorial (#372)
This commit is contained in:
18
docs_raw/source/features/batch_rl.rst
Normal file
18
docs_raw/source/features/batch_rl.rst
Normal file
@@ -0,0 +1,18 @@
|
||||
Batch Reinforcement Learning
|
||||
============================
|
||||
|
||||
Coach supports Batch Reinforcement Learning, where learning is based solely on a (fixed) batch of data.
|
||||
In Batch RL, we are given a dataset of experience, which was collected using some (one or more) deployed policies, and we would
|
||||
like to use it to learn a better policy than what was used to collect the dataset.
|
||||
There is no simulator to interact with, and so we cannot collect any new data, meaning we often cannot explore the MDP any further.
|
||||
To make things even harder, we would also like to use the dataset in order to evaluate the newly learned policy
|
||||
(using off-policy evaluation), since we do not have a simulator which we can use to evaluate the policy on.
|
||||
Batch RL is also often beneficial in cases where we just want to separate the inference (data collection) from the
|
||||
training process of a new policy. This is often the case where we have a system on which we could quite easily deploy a policy
|
||||
and collect experience data, but cannot easily use that system's setup to online train a new policy (as is often the
|
||||
case with more standard RL algorithms).
|
||||
|
||||
Coach supports (almost) all of the integrated off-policy algorithms with Batch RL.
|
||||
|
||||
A lot more details and example usage can be found in the
|
||||
`tutorial <https://github.com/NervanaSystems/coach/blob/master/tutorials/4.%20Batch%20Reinforcement%20Learning.ipynb>`_.
|
||||
@@ -7,4 +7,5 @@ Features
|
||||
|
||||
algorithms
|
||||
environments
|
||||
benchmarks
|
||||
benchmarks
|
||||
batch_rl
|
||||
Reference in New Issue
Block a user