Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2026-03-06 17:35:53 +01:00 · 2018-09-04 15:07:54 +03:00
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
--- a/rl_coach/architectures/tensorflow_components/heads/head.py
+++ b/rl_coach/architectures/tensorflow_components/heads/head.py
@@ -59,7 +59,10 @@ class Head(object):
        self.loss = []
        self.loss_type = []
        self.regularizations = []
-        self.loss_weight = force_list(loss_weight)
+        # self.loss_weight = force_list(loss_weight)
+        self.loss_weight = tf.Variable(force_list(loss_weight), trainable=False, collections=[tf.GraphKeys.LOCAL_VARIABLES])
+        self.loss_weight_placeholder = tf.placeholder("float")
+        self.set_loss_weight = tf.assign(self.loss_weight, self.loss_weight_placeholder)
        self.target = []
        self.importance_weight = []
        self.input = []