Enabling-more-agents-for-Batch-RL-and-cleanup (#258)

allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups
2026-03-01 14:15:46 +01:00 · 2019-03-21 16:10:29 +02:00
parent abec59f367
commit 6e08c55ad5
24 changed files with 152 additions and 69 deletions
--- a/rl_coach/agents/clipped_ppo_agent.py
+++ b/rl_coach/agents/clipped_ppo_agent.py
@@ -207,6 +207,8 @@ class ClippedPPOAgent(ActorCriticAgent):
                       self.networks['main'].online_network.output_heads[1].likelihood_ratio,
                       self.networks['main'].online_network.output_heads[1].clipped_likelihood_ratio]

+            # TODO-fixme if batch.size / self.ap.network_wrappers['main'].batch_size is not an integer, we do not train on
+            #  some of the data
            for i in range(int(batch.size / self.ap.network_wrappers['main'].batch_size)):
                start = i * self.ap.network_wrappers['main'].batch_size
                end = (i + 1) * self.ap.network_wrappers['main'].batch_size