Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file
2026-02-15 21:45:46 +01:00 · 2018-09-04 15:07:54 +03:00
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions
--- a/rl_coach/tests/README.md
+++ b/rl_coach/tests/README.md
@@ -0,0 +1,60 @@
+# Coach - Tests
+
+Coach is a complex framework consisting of various features and running schemes.
+On top of that, reinforcement learning adds stochasticity in many places along the experiments, which makes getting the
+same results run-after-run is almost impossible.
+To address those issues, and ensure that Coach keeps working as expected, we separated our testing mechanism into
+several parts, each testing the framework in different areas and strictness.
+
+* **Docker** -
+    
+    The docker image we supply checks Coach in terms of installation process, and verifies that all the components
+    are installed correctly. To build the Docke, use the command:
+    
+    ```
+    docker build . -t coach
+    docker run -it coach /bin/bash
+    ```
+    
+
+* **Unit tests** -
+    
+    The unit tests test sub components of Coach with different parameters and verifies that they work as expected.
+    There are currently tens of tests and we keep adding new ones. We use pytest in order to run the tests, using
+    the following command:
+    
+    ```
+    python3 -m pytest rl_coach/tests -m unit_test
+    ```
+
+* **Integration tests** -
+    
+    The integration tests make sure that all the presets are runnable. It's a static tests that does not check the
+    performance at all. It only checks that the preset can start running with no import error or other bugs.
+    To run the integration tests, use the following command:
+    
+    ```
+    python3 -m pytest rl_coach/tests -m integration_test
+    ```
+
+* **Golden tests** -
+    
+    The golden tests run a subset of the presets available in Coach, and verify that they pass a known score after
+    a known amount of steps. The threshold for the tests are defined as part of each preset. The presets which are
+    tested are presets that can be run in a short amount of time, and the requirements for passing are quite weak.
+    The golden tests can be run using the following command:
+    
+    ```
+    python3 rl_coach/tests/golden_tests.py
+    ```
+
+* **Trace tests** -
+    
+    The trace tests run all the presets available in Coach, and compare their csv output to traces we extracted after
+    verifying each preset works correctly. The requirements for passing these tests are quite strict - all the values
+    in the csv file should match the golden csv file exactly. The trace tests can be run in parallel to shorten the
+    testing time. To run the tests in parallel use the following command:
+    
+    ```
+    python3 rl_coach/tests/trace_tests.py -prl
+    ```
--- a/rl_coach/tests/golden_tests.py
+++ b/rl_coach/tests/golden_tests.py
@@ -67,24 +67,21 @@ def perform_reward_based_tests(args, preset_validation_params, preset_name):
    # run the experiment in a separate thread
    screen.log_title("Running test {}".format(preset_name))
    log_file_name = 'test_log_{preset_name}.txt'.format(preset_name=preset_name)
-    cmd = (
-        'python3 rl_coach/coach.py '
-        '-p {preset_name} '
-        '-e {test_name} '
-        '-n {num_workers} '
-        '--seed 0 '
-        '-c '
-        '{level} '
-        '&> {log_file_name} '
-    ).format(
-        preset_name=preset_name,
-        test_name=test_name,
-        num_workers=preset_validation_params.num_workers,
-        log_file_name=log_file_name,
-        level='-lvl ' + preset_validation_params.reward_test_level if preset_validation_params.reward_test_level else ''
-    )
+    cmd = [
+        'python3',
+        'rl_coach/coach.py',
+        '-p', '{preset_name}'.format(preset_name=preset_name),
+        '-e', '{test_name}'.format(test_name=test_name),
+        '-n', '{num_workers}'.format(num_workers=preset_validation_params.num_workers),
+        '--seed', '0',
+        '-c'
+    ]
+    if preset_validation_params.reward_test_level:
+        cmd += ['-lvl', '{level}'.format(level=preset_validation_params.reward_test_level)]

-    p = subprocess.Popen(cmd, shell=True, executable="/bin/bash", preexec_fn=os.setsid)
+    stdout = open(log_file_name, 'w')
+
+    p = subprocess.Popen(cmd, stdout=stdout, stderr=stdout)

    start_time = time.time()

@@ -148,7 +145,8 @@ def perform_reward_based_tests(args, preset_validation_params, preset_name):
            time.sleep(1)

    # kill test and print result
-    os.killpg(os.getpgid(p.pid), signal.SIGTERM)
+    # os.killpg(os.getpgid(p.pid), signal.SIGKILL)
+    p.kill()
    screen.log('')
    if test_passed:
        screen.success("Passed successfully")
--- a/rl_coach/tests/trace_tests.py
+++ b/rl_coach/tests/trace_tests.py
@@ -131,7 +131,10 @@ def wait_and_check(args, processes, force=False):
            os.makedirs(trace_path)
            df = pd.read_csv(csv_paths[0])
            df = clean_df(df)
-            df.to_csv(os.path.join(trace_path, 'trace.csv'), index=False)
+            try:
+                df.to_csv(os.path.join(trace_path, 'trace.csv'), index=False)
+            except:
+                pass
            screen.success("Successfully created new trace.")
            test_passed = True
        else: