Bug fix: when enabling 'heatup_using_network_decisions', we should add the configured noise (#162)

During heatup we may want to add agent-generated-noise (i.e. not "simple" random noise). This is enabled by setting 'heatup_using_network_decisions' to True. For example: agent_params = DDPGAgentParameters() agent_params.algorithm.heatup_using_network_decisions = True The fix ensures that the correct noise is added not just while in the TRAINING phase, but also during the HEATUP phase. No one has enabled 'heatup_using_network_decisions' yet, which explains why this problem arose only now (in my configuration I do enable 'heatup_using_network_decisions').
2026-02-26 12:15:50 +01:00 · 2018-12-17 10:08:54 +02:00
parent f9ee526536
commit b4bc8a476c
2 changed files with 2 additions and 2 deletions
--- a/rl_coach/exploration_policies/additive_noise.py
+++ b/rl_coach/exploration_policies/additive_noise.py
@@ -88,7 +88,7 @@ class AdditiveNoise(ExplorationPolicy):
            action_values_mean = action_values.squeeze()

        # step the noise schedule
-        if self.phase == RunPhase.TRAIN:
+        if self.phase is not RunPhase.TEST:
            self.noise_percentage_schedule.step()
            # the second element of the list is assumed to be the standard deviation
            if isinstance(action_values, list) and len(action_values) > 1:
--- a/rl_coach/exploration_policies/truncated_normal.py
+++ b/rl_coach/exploration_policies/truncated_normal.py
@@ -92,7 +92,7 @@ class TruncatedNormal(ExplorationPolicy):
            action_values_mean = action_values.squeeze()

        # step the noise schedule
-        if self.phase == RunPhase.TRAIN:
+        if self.phase is not RunPhase.TEST:
            self.noise_percentage_schedule.step()
            # the second element of the list is assumed to be the standard deviation
            if isinstance(action_values, list) and len(action_values) > 1: