coach/docs_raw/source/components/agents/imitation/bc.rst at 138ced23ba57efd34237b816a25b83fb9f04417d

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2025-12-18 19:50:17 +01:00

Files

Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91 )

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation

2018-11-15 15:00:13 +02:00

874 B

Raw Blame History

Actions space: Discrete | Continuous

Network Structure

Algorithm Description

Training the network

The replay buffer contains the expert demonstrations for the task. These demonstrations are given as state, action tuples, and with no reward. The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.

Sample a batch of transitions from the replay buffer.
Use the current states as input to the network, and the expert actions as the targets of the network.
For the network head, we use the policy head, which uses the cross entropy loss function.

System Message: ERROR/3 (<string>, line 29)

Unknown directive type "autoclass".

.. autoclass:: rl_coach.agents.bc_agent.BCAlgorithmParameters

874 B Raw Blame History

Network Structure

Algorithm Description

Training the network

874 B

Raw Blame History