1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 11:40:18 +01:00

update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
This commit is contained in:
Itai Caspi
2018-11-15 15:00:13 +02:00
committed by Gal Novik
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions

View File

@@ -0,0 +1,10 @@
Algorithms
==========
Coach supports many state-of-the-art reinforcement learning algorithms, which are separated into three main classes -
value optimization, policy optimization and imitation learning.
A detailed description of those algorithms may be found in the `agents <../components/agents/index.html>`_ section.
.. image:: /_static/img/algorithms.png
:width: 600px
:align: center

View File

@@ -0,0 +1,22 @@
Benchmarks
==========
Reinforcement learning is a developing field, and so far it has been particularly difficult to reproduce some of the
results published in the original papers. Some reasons for this are:
* Reinforcement learning algorithms are notoriously known as having an unstable learning process.
The data the neural networks trains on is dynamic, and depends on the random seed defined for the environment.
* Reinforcement learning algorithms have many moving parts. For some environments and agents, there are many
"tricks" which are needed to get the exact behavior the paper authors had seen. Also, there are **a lot** of
hyper-parameters to set.
In order for a reinforcement learning implementation to be useful for research or for data science, it must be
shown that it achieves the expected behavior. For this reason, we collected a set of benchmark results from most
of the algorithms implemented in Coach. The algorithms were tested on a subset of the same environments that were
used in the original papers, and with multiple seed for each environment.
Additionally, Coach uses some strict testing mechanisms to try and make sure the results we show for these
benchmarks stay intact as Coach continues to develop.
To see the benchmark results, please visit the
`following GitHub page <https://github.com/NervanaSystems/coach/tree/master/benchmarks>`_.

View File

@@ -0,0 +1,31 @@
Environments
============
Coach supports a large number of environments which can be solved using reinforcement learning.
To find a detailed documentation of the environments API, see the `environments section <../components/environments/index.html>`_.
The supported environments are:
* `DeepMind Control Suite <https://github.com/deepmind/dm_control>`_ - a set of reinforcement learning environments
powered by the MuJoCo physics engine.
* `Blizzard Starcraft II <https://github.com/deepmind/pysc2>`_ - a popular strategy game which was wrapped with a
python interface by DeepMind.
* `ViZDoom <http://vizdoom.cs.put.edu.pl/>`_ - a Doom-based AI research platform for reinforcement learning
from raw visual information.
* `CARLA <https://github.com/carla-simulator/carla>`_ - an open-source simulator for autonomous driving research.
* `OpenAI Gym <https://gym.openai.com/>`_ - a library which consists of a set of environments, from games to robotics.
Additionally, it can be extended using the API defined by the authors.
In Coach, we support all the native environments in Gym, along with several extensions such as:
* `Roboschool <https://github.com/openai/roboschool>`_ - a set of environments powered by the PyBullet engine,
that offer a free alternative to MuJoCo.
* `Gym Extensions <https://github.com/Breakend/gym-extensions>`_ - a set of environments that extends Gym for
auxiliary tasks (multitask learning, transfer learning, inverse reinforcement learning, etc.)
* `PyBullet <https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet>`_ - a physics engine that
includes a set of robotics environments.

View File

@@ -0,0 +1,10 @@
Features
========
.. toctree::
:maxdepth: 1
:caption: Features
algorithms
environments
benchmarks