1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
Files
coach/rl_coach/exploration_policies
Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91)
* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
2018-11-15 15:00:13 +02:00
..
2018-08-13 17:11:34 +03:00

Exploration Policy

An exploration policy is a module that is responsible for choosing the action according to the action values, the current phase, its internal state and the specific exploration policy algorithm.

A custom exploration policy should implement both the exploration policy class and the exploration policy parameters class, which defines the parameters and the location of the exploration policy module. The parameters of the exploration policy class should match the parameters in the exploration policy parameters class.

Exploration policies typically have some control parameter that defines its current exploration state, and a schedule for this parameter. This schedule can be defined using the Schedule class which is defined in exploration_policy.py.

A custom implementation should look as follows:

class CustomExplorationParameters(ExplorationParameters):
    def __init__(self):
        super().__init__()
        ...

    @property
    def path(self):
        return 'module_path:class_name'


class CustomExplorationPolicy(ExplorationPolicy):
    def __init__(self, action_space: ActionSpace, ...):
        super().__init__(action_space)

    def reset(self):
        ...

    def get_action(self, action_values: List[ActionType]) -> ActionType:
        ...

    def change_phase(self, phase):
        ...

    def get_control_param(self):
        ...