moving the docs to github

2026-02-11 19:25:53 +01:00 · 2018-04-23 09:14:20 +03:00
parent cafa152382
commit 5d5562bf62
118 changed files with 10792 additions and 3 deletions
--- a/docs_raw/docs/algorithms/imitation/bc.md
+++ b/docs_raw/docs/algorithms/imitation/bc.md
@@ -0,0 +1,25 @@
+# Behavioral Cloning
+
+**Actions space:** Discrete|Continuous
+
+## Network Structure
+
+<p style="text-align: center;">
+
+<img src="..\..\design_imgs\dqn.png">
+
+</p>
+
+
+
+## Algorithm Description
+
+### Training the network
+
+The replay buffer contains the expert demonstrations for the task.
+These demonstrations are given as state, action tuples, and with no reward.
+The training goal is to reduce the difference between the actions predicted by the network and the actions taken by the expert for each state.
+
+1. Sample a batch of transitions from the replay buffer.
+2. Use the current states as input to the network, and the expert actions as the targets of the network.
+3. The loss function for the network is MSE, and therefore we use the Q head to minimize this loss.