1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00

moving the docs to github

This commit is contained in:
itaicaspi-intel
2018-04-23 09:14:20 +03:00
parent cafa152382
commit 5d5562bf62
118 changed files with 10792 additions and 3 deletions

View File

@@ -0,0 +1,25 @@
# Behavioral Cloning
**Actions space:** Discrete|Continuous
## Network Structure
<p style="text-align: center;">
<img src="..\..\design_imgs\dqn.png">
</p>
## Algorithm Description
### Training the network
The replay buffer contains the expert demonstrations for the task.
These demonstrations are given as state, action tuples, and with no reward.
The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.
1. Sample a batch of transitions from the replay buffer.
2. Use the current states as input to the network, and the expert actions as the targets of the network.
3. The loss function for the network is MSE, and therefore we use the Q head to minimize this loss.