pre-release 0.10.0

2026-03-18 15:53:35 +01:00 · 2018-08-13 17:11:34 +03:00
parent d44c329bb8
commit 19ca5c24b1
485 changed files with 33292 additions and 16770 deletions
--- a/docs_raw/init.py
+++ b/docs_raw/init.py
--- a/docs_raw/docs/init.py
+++ b/docs_raw/docs/init.py
--- a/docs_raw/docs/contributing/add_agent.md
+++ b/docs_raw/docs/contributing/add_agent.md
@@ -2,37 +2,67 @@

 Coach's modularity makes adding an agent a simple and clean task, that involves the following steps:

-1.	Implement your algorithm in a new file under the agents directory. The agent can inherit base classes such as **ValueOptimizationAgent** or **ActorCriticAgent**, or the more generic **Agent** base class.
+1.	Implement your algorithm in a new file. The agent can inherit base classes such as **ValueOptimizationAgent** or
+    **ActorCriticAgent**, or the more generic **Agent** base class.
    
    * **ValueOptimizationAgent**, **PolicyOptimizationAgent** and **Agent** are abstract classes. 
-    learn_from_batch() should be overriden with the desired behavior for the algorithm being implemented. If deciding to inherit from **Agent**, also choose_action() should be overriden.       
+    learn_from_batch() should be overriden with the desired behavior for the algorithm being implemented.
+    If deciding to inherit from **Agent**, also choose_action() should be overriden.
        
    
-            def learn_from_batch(self, batch):
+            def learn_from_batch(self, batch) -> Tuple[float, List, List]:
                """
                Given a batch of transitions, calculates their target values and updates the network.
                :param batch: A list of transitions
-                :return: The loss of the training
+                :return: The total loss of the training, the loss per head and the unclipped gradients
                """
-                pass
-                
-            def choose_action(self, curr_state, phase=RunPhase.TRAIN):
+
+            def choose_action(self, curr_state):
                """
                choose an action to act with in the current episode being played. Different behavior might be exhibited when training
                 or testing.
-                 
-                :param curr_state: the current state to act upon.  
-                :param phase: the current phase: training or testing.
+
+                :param curr_state: the current state to act upon.
                :return: chosen action, some action value describing the action (q-value, probability, etc)
                """
-                pass
-                
-            
-       
-    * Make sure to add your new agent to **agents/\_\_init\_\_.py**
-    
-2.	Implement your agent's specific network head, if needed, at the implementation for the framework of your choice. For example **architectures/neon_components/heads.py**. The head will inherit the generic base class Head.
-    A new output type should be added to configurations.py, and a mapping between the new head and output type should be defined in the get_output_head() function at **architectures/neon_components/general_network.py**
-3.	Define a new configuration class at configurations.py, which includes the new agent name in the **type** field, the new output type in the **output_types** field, and assigning default values to hyperparameters.
-4.	(Optional) Define a preset using the new agent type with a given environment, and the hyperparameters that should be used for training on that environment.
+
+2.	Implement your agent's specific network head, if needed, at the implementation for the framework of your choice.
+    For example **architectures/neon_components/heads.py**. The head will inherit the generic base class Head.
+    A new output type should be added to configurations.py, and a mapping between the new head and output type should
+    be defined in the get_output_head() function at **architectures/neon_components/general_network.py**
+
+3.	Define a new parameters class that inherits AgentParameters.
+    The parameters class defines all the hyperparameters for the agent, and is initialized with 4 main components:
+    * **algorithm**: A class inheriting AlgorithmParameters which defines any algorithm specific parameters
+    * **exploration**: A class inheriting ExplorationParameters which defines the exploration policy parameters.
+                   There are several common exploration policies built-in which you can use, and are defined under
+                   the exploration sub directory. You can also define your own custom exploration policy.
+    * **memory**: A class inheriting MemoryParameters which defined the memory parameters.
+              There are several common memory types built-in which you can use, and are defined under the memories
+              sub directory. You can also define your own custom memory.
+    * **networks**: A dictionary defining all the networks that will be used by the agent. The keys of the dictionary
+                define the network name and will be used to access each network through the agent class.
+                The dictionary values are a class inheriting NetworkParameters, which define the network structure
+                and parameters.
+
+
+    Additionally, set the path property to return the path to your agent class in the following format:
+
+            <path to python module>:<name of agent class>
+
+    For example,
+
+            class RainbowAgentParameters(AgentParameters):
+            def __init__(self):
+                super().__init__(algorithm=RainbowAlgorithmParameters(),
+                                 exploration=RainbowExplorationParameters(),
+                                 memory=RainbowMemoryParameters(),
+                                 networks={"main": RainbowNetworkParameters()})
+
+            @property
+            def path(self):
+                return 'rainbow.rainbow_agent:RainbowAgent'
+
+4.	(Optional) Define a preset using the new agent type with a given environment, and the hyper-parameters that should
+    be used for training on that environment.

--- a/docs_raw/docs/contributing/add_env.md
+++ b/docs_raw/docs/contributing/add_env.md
@@ -1,70 +1,79 @@
 Adding a new environment to Coach is as easy as solving CartPole. 

+There are essentially two ways to integrate new environments to Coach:
+
+## Using the OpenAI Gym API
+
+If your environment is already using the OpenAI Gym API, you are already good to go.
+When selecting the environment parameters in the preset, use GymEnvironmentParameters(),
+and pass the path to your environment source code using the level parameter.
+You can specify additional parameters for your environment using the additional_simulator_parameters parameter.
+Take for example the definition used in the Pendulum_HAC preset:
+
+        env_params = GymEnvironmentParameters()
+        env_params.level = "rl_coach.environments.mujoco.pendulum_with_goals:PendulumWithGoals"
+        env_params.additional_simulator_parameters = {"time_limit": 1000}
+
+## Using the Coach API
+
 There are a few simple steps to follow, and we will walk through them one by one.

-1.  Coach defines a simple API for implementing a new environment which is defined in environment/environment_wrapper.py.
-    There are several functions to implement, but only some of them are mandatory. 
+1.  Create a new class for your environment, and inherit the Environment class.
+
+2.  Coach defines a simple API for implementing a new environment, which are defined in environment/environment.py.
+    There are several functions to implement, but only some of them are mandatory.

    Here are the important ones:

-            def _take_action(self, action_idx):
+            def _take_action(self, action_idx: ActionType) -> None:
                """
                An environment dependent function that sends an action to the simulator.
-                :param action_idx: the action to perform on the environment.
+                :param action_idx: the action to perform on the environment
                :return: None
                """
-                pass

-            def _preprocess_observation(self, observation):
-                """
-                Do initial observation preprocessing such as cropping, rgb2gray, rescale etc.
-                Implementing this function is optional.
-                :param observation: a raw observation from the environment
-                :return: the preprocessed observation
-                """
-                return observation
-
-            def _update_state(self):
+            def _update_state(self) -> None:
                """
                Updates the state from the environment.
                Should update self.observation, self.reward, self.done, self.measurements and self.info
                :return: None
                """
-                pass

-            def _restart_environment_episode(self, force_environment_reset=False):
+            def _restart_environment_episode(self, force_environment_reset=False) -> None:
                """
+                Restarts the simulator episode
                :param force_environment_reset: Force the environment to reset even if the episode is not done yet.
-                :return:
+                :return: None
                """
-                pass

-            def get_rendered_image(self):
+            def _render(self) -> None:
+                """
+                Renders the environment using the native simulator renderer
+                :return: None
+                """
+
+            def get_rendered_image(self) -> np.ndarray:
                """
                Return a numpy array containing the image that will be rendered to the screen.
                This can be different from the observation. For example, mujoco's observation is a measurements vector.
                :return: numpy array containing the image that will be rendered to the screen
                """
-                return self.observation

+3.  Create a new parameters class for your environment, which inherits the EnvironmentParameters class.
+    In the __init__ of your class, define all the parameters you used in your Environment class.
+    Additionally, fill the path property of the class with the path to your Environment class.
+    For example, take a look at the EnvironmentParameters class used for Doom:

-2.  Make sure to import the environment in environments/\_\_init\_\_.py:
-        
-        from doom_environment_wrapper import *
-        
-    Also, a new entry should be added to the EnvTypes enum mapping the environment name to the wrapper's class name:
-        
-        Doom = "DoomEnvironmentWrapper"
+            class DoomEnvironmentParameters(EnvironmentParameters):
+            def __init__(self):
+                super().__init__()
+                self.default_input_filter = DoomInputFilter
+                self.default_output_filter = DoomOutputFilter
+                self.cameras = [DoomEnvironment.CameraTypes.OBSERVATION]
+
+            @property
+            def path(self):
+                return 'rl_coach.environments.doom_environment:DoomEnvironment'
    
-                
-3. In addition a new configuration class should be implemented for defining the environment's parameters and placed in configurations.py. 
-For instance, the following is used for Doom:

-        class Doom(EnvironmentParameters):
-            type = 'Doom'
-            frame_skip = 4
-            observation_stack_size = 3
-            desired_observation_height = 60
-            desired_observation_width = 76
-            
-4. And that's it, you're done. Now just add a new preset with your newly created environment, and start training an agent on top of it. 
+4.  And that's it, you're done. Now just add a new preset with your newly created environment, and start training an agent on top of it.
--- a/docs_raw/docs/design/control_flow.md
+++ b/docs_raw/docs/design/control_flow.md
@@ -0,0 +1,94 @@
+<!-- language-all: python -->
+
+# Coach Control Flow
+
+Coach is built in a modular way, encouraging modules reuse and reducing the amount of boilerplate code needed
+for developing new algorithms or integrating a new challenge as an environment.
+On the other hand, it can be overwhelming for new users to ramp up on the code.
+To help with that, here's a short overview of the control flow.
+
+## Graph Manager
+
+The main entry point for Coach is **coach.py**.
+The main functionality of this script is to parse the command line arguments and invoke all the sub-processes needed
+for the given experiment.
+**coach.py** executes the given **preset** file which returns a **GraphManager** object.
+
+A **preset** is a design pattern that is intended for concentrating the entire definition of an experiment in a single
+file. This helps with experiments reproducibility, improves readability and prevents confusion.
+The outcome of a preset is a **GraphManager** which will usually be instantiated in the final lines of the preset.
+
+A **GraphManager** is an object that holds all the agents and environments of an experiment, and is mostly responsible
+for scheduling their work. Why is it called a **graph** manager? Because agents and environments are structured into
+a graph of interactions. For example, in hierarchical reinforcement learning schemes, there will often be a master
+policy agent, that will control a sub-policy agent, which will interact with the environment. Other schemes can have
+much more complex graphs of control, such as several hierarchy layers, each with multiple agents.
+The graph manager's main loop is the improve loop.
+
+<p style="text-align: center;">
+
+<img src="../../img/improve.png" alt="Improve loop" style="width: 400px;"/>
+
+</p>
+
+The improve loop skips between 3 main phases - heatup, training and evaluation:
+
+* **Heatup** - the goal of this phase is to collect initial data for populating the replay buffers. The heatup phase
+  takes place only in the beginning of the experiment, and the agents will act completely randomly during this phase.
+  Importantly, the agents do not train their networks during this phase. DQN for example, uses 50k random steps in order
+  to initialize the replay buffers.
+
+* **Training** - the training phase is the main phase of the experiment. This phase can change between agent types,
+  but essentially consists of repeated cycles of acting, collecting data from the environment, and training the agent
+  networks. During this phase, the agent will use its exploration policy in training mode, which will add noise to its
+  actions in order to improve its knowledge about the environment state space.
+
+* **Evaluation** - the evaluation phase is intended for evaluating the current performance of the agent. The agents
+  will act greedily in order to exploit the knowledge aggregated so far and the performance over multiple episodes of
+  evaluation will be averaged in order to reduce the stochasticity effects of all the components.
+
+
+## Level Manager
+
+In each of the 3 phases described above, the graph manager will invoke all the hierarchy levels in the graph in a
+synchronized manner. In Coach, agents do not interact directly with the environment. Instead, they go through a
+*LevelManager*, which is a proxy that manages their interaction. The level manager passes the current state and reward
+from the environment to the agent, and the actions from the agent to the environment.
+
+The motivation for having a level manager is to disentangle the code of the environment and the agent, so to allow more
+complex interactions. Each level can have multiple agents which interact with the environment. Who gets to choose the
+action for each step is controlled by the level manager.
+Additionally, each level manager can act as an environment for the hierarchy level above it, such that each hierarchy
+level can be seen as an interaction between an agent and an environment, even if the environment is just more agents in
+a lower hierarchy level.
+
+
+## Agent
+
+The base agent class has 3 main function that will be used during those phases - observe, act and train.
+
+* **Observe** - this function gets the latest response from the environment as input, and updates the internal state
+  of the agent with the new information. The environment response will
+  be first passed through the agent's **InputFilter** object, which will process the values in the response, according
+  to the specific agent definition. The environment response will then be converted into a
+  **Transition** which will contain the information from a single step
+  ($ s_{t}, a_{t}, r_{t}, s_{t+1}, terminal signal $), and store it in the memory.
+
+<img src="../../img/observe.png" alt="Observe" style="width: 700px;"/>
+
+* **Act** - this function uses the current internal state of the agent in order to select the next action to take on
+  the environment. This function will call the per-agent custom function **choose_action** that will use the network
+  and the exploration policy in order to select an action. The action will be stored, together with any additional
+  information (like the action value for example) in an **ActionInfo** object. The ActionInfo object will then be
+  passed through the agent's **OutputFilter** to allow any processing of the action (like discretization,
+  or shifting, for example), before passing it to the environment.
+
+<img src="../../img/act.png" alt="Act" style="width: 700px;"/>
+
+* **Train** - this function will sample a batch from the memory and train on it. The batch of transitions will be
+  first wrapped into a **Batch** object to allow efficient querying of the batch values. It will then be passed into
+  the agent specific **learn_from_batch** function, that will extract network target values from the batch and will
+  train the networks accordingly. Lastly, if there's a target network defined for the agent, it will sync the target
+  network weights with the online network.
+
+<img src="../../img/train.png" alt="Train" style="width: 700px;"/>
--- a/docs_raw/docs/design/features.md
+++ b/docs_raw/docs/design/features.md
@@ -0,0 +1,44 @@
+# Coach Features
+
+## Supported Algorithms
+
+Coach supports many state-of-the-art reinforcement learning algorithms, which are separated into two main classes -
+value optimization and policy optimization. A detailed description of those algorithms may be found in the algorithms
+section.
+
+<p style="text-align: center;">
+
+<img src="../../img/algorithms.png" alt="Supported Algorithms" style="width: 600px;"/>
+
+</p>
+
+
+## Supported Environments
+
+Coach supports a large number of environments which can be solved using reinforcement learning:
+
+* **[DeepMind Control Suite](https://github.com/deepmind/dm_control)** - a set of reinforcement learning environments
+  powered by the MuJoCo physics engine.
+
+* **[Blizzard Starcraft II](https://github.com/deepmind/pysc2)** - a popular strategy game which was wrapped with a
+  python interface by DeepMind.
+
+* **[ViZDoom](http://vizdoom.cs.put.edu.pl/)** - a Doom-based AI research platform for reinforcement learning
+  from raw visual information.
+
+* **[CARLA](https://github.com/carla-simulator/carla)** - an open-source simulator for autonomous driving research.
+
+* **[OpenAI Gym](https://gym.openai.com/)** - a library which consists of a set of environments, from games to robotics.
+  Additionally, it can be extended using the API defined by the authors.
+
+  In Coach, we support all the native environments in Gym, along with several extensions such as:
+
+* **[Roboschool](https://github.com/openai/roboschool)** - a set of environments powered by the PyBullet engine,
+    that offer a free alternative to MuJoCo.
+
+* **[Gym Extensions](https://github.com/Breakend/gym-extensions)** - a set of environments that extends Gym for
+    auxiliary tasks (multitask learning, transfer learning, inverse reinforcement learning, etc.)
+
+* **[PyBullet](https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet)** - a physics engine that
+    includes a set of robotics environments.
+
--- a/docs_raw/docs/design/filters.md
+++ b/docs_raw/docs/design/filters.md
@@ -0,0 +1,116 @@
+# Filters
+
+Filters are a mechanism in Coach that allows doing pre-processing and post-processing of the internal agent information.
+There are two filter categories -
+
+* **Input filters** - these are filters that process the information passed **into** the agent from the environment.
+  This information includes the observation and the reward. Input filters therefore allow rescaling observations,
+  normalizing rewards, stack observations, etc.
+
+* **Output filters** - these are filters that process the information going **out** of the agent into the environment.
+  This information includes the action the agent chooses to take. Output filters therefore allow conversion of
+  actions from one space into another. For example, the agent can take $ N $ discrete actions, that will be mapped by
+  the output filter onto $ N $ continuous actions.
+
+Filters can be stacked on top of each other in order to build complex processing flows of the inputs or outputs.
+
+<p style="text-align: center;">
+
+<img src="../../img/filters.png" alt="Filters mechanism" style="width: 350px;"/>
+
+</p>
+
+## Input Filters
+
+The input filters are separated into two categories - **observation filters** and **reward filters**.
+
+### Observation Filters
+
+* **ObservationClippingFilter** - Clips the observation values to a given range of values. For example, if the
+  observation consists of measurements in an arbitrary range, and we want to control the minimum and maximum values
+  of these observations, we can define a range and clip the values of the measurements.
+
+* **ObservationCropFilter** - Crops the size of the observation to a given crop window. For example, in Atari, the
+  observations are images with a shape of 210x160. Usually, we will want to crop the size of the observation to a
+  square of 160x160 before rescaling them.
+
+* **ObservationMoveAxisFilter** - Reorders the axes of the observation. This can be useful when the observation is an
+  image, and we want to move the channel axis to be the last axis instead of the first axis.
+
+* **ObservationNormalizationFilter** - Normalizes the observation values with a running mean and standard deviation of
+  all the observations seen so far. The normalization is performed element-wise. Additionally, when working with
+  multiple workers, the statistics used for the normalization operation are accumulated over all the workers.
+
+* **ObservationReductionBySubPartsNameFilter** - Allows keeping only parts of the observation, by specifying their
+  name. For example, the CARLA environment extracts multiple measurements that can be used by the agent, such as
+  speed and location. If we want to only use the speed, it can be done using this filter.
+
+* **ObservationRescaleSizeByFactorFilter** - Rescales an image observation by some factor. For example, the image size
+  can be reduced by a factor of 2.
+
+* **ObservationRescaleToSizeFilter** - Rescales an image observation to a given size. The target size does not
+  necessarily keep the aspect ratio of the original observation.
+
+* **ObservationRGBToYFilter** - Converts a color image observation specified using the RGB encoding into a grayscale
+  image observation, by keeping only the luminance (Y) channel of the YUV encoding. This can be useful if the colors
+  in the original image are not relevant for solving the task at hand.
+
+* **ObservationSqueezeFilter** - Removes redundant axes from the observation, which are axes with a dimension of 1.
+
+* **ObservationStackingFilter** - Stacks several observations on top of each other. For image observation this will
+  create a 3D blob. The stacking is done in a lazy manner in order to reduce memory consumption. To achieve this,
+  a LazyStack object is used in order to wrap the observations in the stack. For this reason, the
+  ObservationStackingFilter **must** be the last filter in the inputs filters stack.
+
+* **ObservationUint8Filter** - Converts a floating point observation into an unsigned int 8 bit observation. This is
+  mostly useful for reducing memory consumption and is usually used for image observations. The filter will first
+  spread the observation values over the range 0-255 and then discretize them into integer values.
+
+### Reward Filters
+
+* **RewardClippingFilter** - Clips the reward values into a given range. For example, in DQN, the Atari rewards are
+  clipped into the range -1 and 1 in order to control the scale of the returns.
+
+* **RewardNormalizationFilter** -  Normalizes the reward values with a running mean and standard deviation of
+  all the rewards seen so far. When working with multiple workers, the statistics used for the normalization operation
+  are accumulated over all the workers.
+
+* **RewardRescaleFilter** - Rescales the reward by a given factor. Rescaling the rewards of the environment has been
+  observed to have a large effect (negative or positive) on the behavior of the learning process.
+
+## Output Filters
+
+The output filters only process the actions.
+
+### Action Filters
+
+* **AttentionDiscretization** - Discretizes an **AttentionActionSpace**. The attention action space defines the actions
+  as choosing sub-boxes in a given box. For example, consider an image of size 100x100, where the action is choosing
+  a crop window of size 20x20 to attend to in the image. AttentionDiscretization allows discretizing the possible crop
+  windows to choose into a finite number of options, and map a discrete action space into those crop windows.
+
+* **BoxDiscretization** - Discretizes a continuous action space into a discrete action space, allowing the usage of
+  agents such as DQN for continuous environments such as MuJoCo. Given the number of bins to discretize into, the
+  original continuous action space is uniformly separated into the given number of bins, each mapped to a discrete
+  action index. For example, if the original actions space is between -1 and 1 and 5 bins were selected, the new action
+  space will consist of 5 actions mapped to -1, -0.5, 0, 0.5 and 1.
+
+* **BoxMasking** - Masks part of the action space to enforce the agent to work in a defined space. For example,
+  if the original action space is between -1 and 1, then this filter can be used in order to constrain the agent actions
+  to the range 0 and 1 instead. This essentially masks the range -1 and 0 from the agent.
+
+* **PartialDiscreteActionSpaceMap** - Partial map of two countable action spaces. For example, consider an environment
+  with a MultiSelect action space (select multiple actions at the same time, such as jump and go right), with 8 actual
+  MultiSelect actions. If we want the agent to be able to select only 5 of those actions by their index (0-4), we can
+  map a discrete action space with 5 actions into the 5 selected MultiSelect actions. This will both allow the agent to
+  use regular discrete actions, and mask 3 of the actions from the agent.
+
+* **FullDiscreteActionSpaceMap** - Full map of two countable action spaces. This works in a similar way to the
+  PartialDiscreteActionSpaceMap, but maps the entire source action space into the entire target action space, without
+  masking any actions.
+
+* **LinearBoxToBoxMap** - A linear mapping of two box action spaces. For example, if the action space of the
+  environment consists of continuous actions between 0 and 1, and we want the agent to choose actions between -1 and 1,
+  the LinearBoxToBoxMap can be used to map the range -1 and 1 to the range 0 and 1 in a linear way. This means that the
+  action -1 will be mapped to 0, the action 1 will be mapped to 1, and the rest of the actions will be linearly mapped
+  between those values.
--- a/docs_raw/docs/design/network.md
+++ b/docs_raw/docs/design/network.md
@@ -1,6 +1,4 @@
-# Coach Design
-
-## Network Design
+# Network Design

 Each agent has at least one neural network, used as the function approximator, for choosing the actions. The network is designed in a modular way to allow reusability in different agents. It is separated into three main parts:

@@ -21,7 +19,7 @@ Each agent has at least one neural network, used as the function approximator, f

 <p style="text-align: center;">

-<img src="../img/network.png" alt="Network Design" style="width: 400px;"/>
+<img src="../../img/network.png" alt="Network Design" style="width: 400px;"/>

 </p>

@@ -31,17 +29,7 @@ Most of the reinforcement learning agents include more than one copy of the neur

 <p style="text-align: center;">

-<img src="../img/distributed.png" alt="Distributed Training" style="width: 600px;"/>
-
-</p>
-
-## Supported Algorithms
-
-Coach supports many state-of-the-art reinforcement learning algorithms, which are separated into two main classes - value optimization and policy optimization. A detailed description of those algorithms may be found in the algorithms section.
-
-<p style="text-align: center;">
-
-<img src="../img/algorithms.png" alt="Supported Algorithms" style="width: 600px;"/>
+<img src="../../img/distributed.png" alt="Distributed Training" style="width: 600px;"/>

 </p>

--- a/docs_raw/docs/diagrams.xml
+++ b/docs_raw/docs/diagrams.xml
--- a/docs_raw/docs/img/act.png
+++ b/docs_raw/docs/img/act.png
--- a/docs_raw/docs/img/filters.png
+++ b/docs_raw/docs/img/filters.png
--- a/docs_raw/docs/img/graph.png
+++ b/docs_raw/docs/img/graph.png
--- a/docs_raw/docs/img/improve.png
+++ b/docs_raw/docs/img/improve.png
--- a/docs_raw/docs/img/level.png
+++ b/docs_raw/docs/img/level.png
--- a/docs_raw/docs/img/observe.png
+++ b/docs_raw/docs/img/observe.png
--- a/docs_raw/docs/img/train.png
+++ b/docs_raw/docs/img/train.png
--- a/docs_raw/docs/index.md
+++ b/docs_raw/docs/index.md
@@ -13,7 +13,7 @@ Coach collects statistics from the training process and supports advanced visual



-Blog post from the Intel® Nervana™ website can be found [here](https://www.intelnervana.com/reinforcement-learning-coach-intel). 
+Blog post from the Intel® AI website can be found [here](https://ai.intel.com/reinforcement-learning-coach-intel/).

 GitHub repository is [here](https://github.com/NervanaSystems/coach). 

--- a/docs_raw/mkdocs.yml
+++ b/docs_raw/mkdocs.yml
@@ -1,6 +1,6 @@
-site_name: Reinforcement Learning Coach Documentation
+site_name: Reinforcement Learning Coach
 theme: readthedocs
-site_description: 'Reinforcement Learning Coach Documentation by Intel Nervana.'
+site_description: 'Reinforcement Learning Coach by Intel Nervana.'
 markdown_extensions: 
 - mdx_math:
    enable_dollar_delimiter: True #for use of inline $..$
@@ -10,8 +10,13 @@ extra_css: [extra.css]

 pages:
 - Home : index.md
- Design: design.md
 - Usage: usage.md
+- Design:
+        - 'Features' : design/features.md
+        - 'Control Flow' : design/control_flow.md
+        - 'Network' : design/network.md
+        - 'Filters' : design/filters.md
+
 - Algorithms:
        - 'DQN' : algorithms/value_optimization/dqn.md
        - 'Double DQN' : algorithms/value_optimization/double_dqn.md