gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-22 10:05:45 +01:00

Files

Gourav Roy c694766fad Avoid Memory Leak in Rollout worker

ISSUE: When we restore checkpoints, we create new nodes in the
Tensorflow graph. This happens when we assign new value (op node) to
RefVariable in GlobalVariableSaver. With every restore the size of TF
graph increases as new nodes are created and old unused nodes are not
removed from the graph. This causes the memory leak in
restore_checkpoint codepath.

FIX: We reset the Tensorflow graph and recreate the Global, Online and
Target networks on every restore. This ensures that the old unused nodes
in TF graph is dropped.

2018-12-25 21:04:21 -08:00

__init__.py

pre-release 0.10.0

2018-08-13 17:11:34 +03:00

basic_rl_graph_manager.py

removing datasets + imports optimization

2018-08-27 10:54:11 +03:00

graph_manager.py

Avoid Memory Leak in Rollout worker

2018-12-25 21:04:21 -08:00

hac_graph_manager.py

removing datasets + imports optimization

2018-08-27 10:54:11 +03:00

hrl_graph_manager.py

removing datasets + imports optimization

2018-08-27 10:54:11 +03:00

README.md

pre-release 0.10.0

2018-08-13 17:11:34 +03:00

README.md

Block Factory

The block factory is a class which creates a block that fits into a specific RL scheme. Example RL schemes are: self play, multi agent, HRL, basic RL, etc. The block factory should create all the components of the block and return the block scheduler. The block factory will then be used to create different combinations of components. For example, an HRL factory can be later instantiated with:

env = Atari Breakout
master (top hierarchy level) agent = DDPG
slave (bottom hierarchy level) agent = DQN

A custom block factory implementation should look as follows:

class CustomFactory(BlockFactory):
    def __init__(self, custom_params):
        super().__init__()

    def _create_block(self, task_index: int, device=None) -> BlockScheduler:
        """
        Create all the block modules and the block scheduler
        :param task_index: the index of the process on which the worker will be run
        :return: the initialized block scheduler
        """

        # Create env
        # Create composite agents
        # Create level managers
        # Create block scheduler

        return block_scheduler