1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 11:40:18 +01:00

update of api docstrings across coach and tutorials [WIP] (#91)

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
This commit is contained in:
Itai Caspi
2018-11-15 15:00:13 +02:00
committed by Gal Novik
parent 524f8436a2
commit 6d40ad1650
517 changed files with 71034 additions and 12834 deletions

View File

@@ -0,0 +1,29 @@
Behavioral Cloning
==================
**Actions space:** Discrete | Continuous
Network Structure
-----------------
.. image:: /_static/img/design_imgs/pg.png
:align: center
Algorithm Description
---------------------
Training the network
++++++++++++++++++++
The replay buffer contains the expert demonstrations for the task.
These demonstrations are given as state, action tuples, and with no reward.
The training goal is to reduce the difference between the actions predicted by the network and the actions taken by
the expert for each state.
1. Sample a batch of transitions from the replay buffer.
2. Use the current states as input to the network, and the expert actions as the targets of the network.
3. For the network head, we use the policy head, which uses the cross entropy loss function.
.. autoclass:: rl_coach.agents.bc_agent.BCAlgorithmParameters

View File

@@ -0,0 +1,36 @@
Conditional Imitation Learning
==============================
**Actions space:** Discrete | Continuous
**References:** `End-to-end Driving via Conditional Imitation Learning <https://arxiv.org/abs/1710.02410>`_
Network Structure
-----------------
.. image:: /_static/img/design_imgs/cil.png
:align: center
Algorithm Description
---------------------
Training the network
++++++++++++++++++++
The replay buffer contains the expert demonstrations for the task.
These demonstrations are given as state, action tuples, and with no reward.
The training goal is to reduce the difference between the actions predicted by the network and the actions taken by
the expert for each state.
In conditional imitation learning, each transition is assigned a class, which determines the goal that was pursuit
in that transitions. For example, 3 possible classes can be: turn right, turn left and follow lane.
1. Sample a batch of transitions from the replay buffer, where the batch is balanced, meaning that an equal number
of transitions will be sampled from each class index.
2. Use the current states as input to the network, and assign the expert actions as the targets of the network heads
corresponding to the state classes. For the other heads, set the targets to match the currently predicted values,
so that the loss for the other heads will be zeroed out.
3. We use a regression head, that minimizes the MSE loss between the network predicted values and the target values.
.. autoclass:: rl_coach.agents.cil_agent.CILAlgorithmParameters