mirror of
https://github.com/gryf/coach.git
synced 2025-12-17 11:10:20 +01:00
Main changes are detailed below: New features - * CARLA 0.7 simulator integration * Human control of the game play * Recording of human game play and storing / loading the replay buffer * Behavioral cloning agent and presets * Golden tests for several presets * Selecting between deep / shallow image embedders * Rendering through pygame (with some boost in performance) API changes - * Improved environment wrapper API * Added an evaluate flag to allow convenient evaluation of existing checkpoints * Improve frameskip definition in Gym Bug fixes - * Fixed loading of checkpoints for agents with more than one network * Fixed the N Step Q learning agent python3 compatibility
765 B
765 B
Behavioral Cloning
Actions space: Discrete|Continuous
Network Structure
Algorithm Description
Training the network
The replay buffer contains the expert demonstrations for the task. These demonstrations are given as state, action tuples, and with no reward. The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.
- Sample a batch of transitions from the replay buffer.
- Use the current states as input to the network, and the expert actions as the targets of the network.
- The loss function for the network is MSE, and therefore we use the Q head to minimize this loss.
