coach/docs_raw/source/components/agents/policy_optimization/hac.rst at 138ced23ba57efd34237b816a25b83fb9f04417d

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00

Files

Itai Caspi 6d40ad1650 update of api docstrings across coach and tutorials [WIP] (#91 )

* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation

2018-11-15 15:00:13 +02:00

730 B

Raw Blame History

Actions space: Continuous

References: Hierarchical Reinforcement Learning with Hindsight

Network Structure

Algorithm Description

Choosing an action

Pass the current states through the actor network, and get an action mean vector μ. While in training phase, use a continuous exploration policy, such as the Ornstein-Uhlenbeck process, to add exploration noise to the action. When testing, use the mean vector μ as-is.

730 B Raw Blame History

Network Structure

Algorithm Description

Choosing an action

Training the network

730 B

Raw Blame History