Release 0.9

Main changes are detailed below: New features - * CARLA 0.7 simulator integration * Human control of the game play * Recording of human game play and storing / loading the replay buffer * Behavioral cloning agent and presets * Golden tests for several presets * Selecting between deep / shallow image embedders * Rendering through pygame (with some boost in performance) API changes - * Improved environment wrapper API * Added an evaluate flag to allow convenient evaluation of existing checkpoints * Improve frameskip definition in Gym Bug fixes - * Fixed loading of checkpoints for agents with more than one network * Fixed the N Step Q learning agent python3 compatibility
2026-04-26 19:01:28 +02:00 · 2017-12-19 19:27:16 +02:00
parent 11faf19649
commit 125c7ee38d
41 changed files with 1713 additions and 260 deletions
@@ -0,0 +1,40 @@
+#
+# Copyright (c) 2017 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from agents.imitation_agent import *
+
+
+# Behavioral Cloning Agent
+class BCAgent(ImitationAgent):
+    def __init__(self, env, tuning_parameters, replicated_device=None, thread_id=0):
+        ImitationAgent.__init__(self, env, tuning_parameters, replicated_device, thread_id)
+
+    def learn_from_batch(self, batch):
+        current_states, _, actions, _, _, _ = self.extract_batch(batch)
+
+        # create the inputs for the network
+        input = current_states
+
+        # the targets for the network are the actions since this is supervised learning
+        if self.env.discrete_controls:
+            targets = np.eye(self.env.action_space_size)[[actions]]
+        else:
+            targets = actions
+
+        result = self.main_network.train_and_sync_networks(input, targets)
+        total_loss = result[0]
+
+        return total_loss