1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
Files
coach/docs/docs/algorithms/imitation/bc.md
Itai Caspi 125c7ee38d Release 0.9
Main changes are detailed below:

New features -
* CARLA 0.7 simulator integration
* Human control of the game play
* Recording of human game play and storing / loading the replay buffer
* Behavioral cloning agent and presets
* Golden tests for several presets
* Selecting between deep / shallow image embedders
* Rendering through pygame (with some boost in performance)

API changes -
* Improved environment wrapper API
* Added an evaluate flag to allow convenient evaluation of existing checkpoints
* Improve frameskip definition in Gym

Bug fixes -
* Fixed loading of checkpoints for agents with more than one network
* Fixed the N Step Q learning agent python3 compatibility
2017-12-19 19:27:16 +02:00

765 B

Behavioral Cloning

Actions space: Discrete|Continuous

Network Structure

Algorithm Description

Training the network

The replay buffer contains the expert demonstrations for the task. These demonstrations are given as state, action tuples, and with no reward. The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.

  1. Sample a batch of transitions from the replay buffer.
  2. Use the current states as input to the network, and the expert actions as the targets of the network.
  3. The loss function for the network is MSE, and therefore we use the Q head to minimize this loss.