1
0
mirror of https://github.com/gryf/coach.git synced 2026-02-15 21:45:46 +01:00

Itaicaspi/episode reset refactoring (#105)

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* reordering of the episode reset operation and allowing to store episodes only when they are terminated

* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()

* tests readme file and refactoring of policy optimization agent train function

* Update README.md

* Update README.md

* additional policy optimization train function simplifications

* Updated the traces after the reordering of the environment reset

* docker and jenkins files

* updated the traces to the ones from within the docker container

* updated traces and added control suite to the docker

* updated jenkins file with the intel proxy + updated doom basic a3c test params

* updated line breaks in jenkins file

* added a missing line break in jenkins file

* refining trace tests ignored presets + adding a configurable beta entropy value

* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue

* updated benchmarks for dueling ddqn breakout and pong

* allowing dynamic updates to the loss weights + bug fix in episode.update_returns

* remove docker and jenkins file
This commit is contained in:
Itai Caspi
2018-09-04 15:07:54 +03:00
committed by GitHub
parent 7086492127
commit 72a1d9d426
92 changed files with 9803 additions and 9740 deletions

60
rl_coach/tests/README.md Normal file
View File

@@ -0,0 +1,60 @@
# Coach - Tests
Coach is a complex framework consisting of various features and running schemes.
On top of that, reinforcement learning adds stochasticity in many places along the experiments, which makes getting the
same results run-after-run is almost impossible.
To address those issues, and ensure that Coach keeps working as expected, we separated our testing mechanism into
several parts, each testing the framework in different areas and strictness.
* **Docker** -
The docker image we supply checks Coach in terms of installation process, and verifies that all the components
are installed correctly. To build the Docke, use the command:
```
docker build . -t coach
docker run -it coach /bin/bash
```
* **Unit tests** -
The unit tests test sub components of Coach with different parameters and verifies that they work as expected.
There are currently tens of tests and we keep adding new ones. We use pytest in order to run the tests, using
the following command:
```
python3 -m pytest rl_coach/tests -m unit_test
```
* **Integration tests** -
The integration tests make sure that all the presets are runnable. It's a static tests that does not check the
performance at all. It only checks that the preset can start running with no import error or other bugs.
To run the integration tests, use the following command:
```
python3 -m pytest rl_coach/tests -m integration_test
```
* **Golden tests** -
The golden tests run a subset of the presets available in Coach, and verify that they pass a known score after
a known amount of steps. The threshold for the tests are defined as part of each preset. The presets which are
tested are presets that can be run in a short amount of time, and the requirements for passing are quite weak.
The golden tests can be run using the following command:
```
python3 rl_coach/tests/golden_tests.py
```
* **Trace tests** -
The trace tests run all the presets available in Coach, and compare their csv output to traces we extracted after
verifying each preset works correctly. The requirements for passing these tests are quite strict - all the values
in the csv file should match the golden csv file exactly. The trace tests can be run in parallel to shorten the
testing time. To run the tests in parallel use the following command:
```
python3 rl_coach/tests/trace_tests.py -prl
```

View File

@@ -67,24 +67,21 @@ def perform_reward_based_tests(args, preset_validation_params, preset_name):
# run the experiment in a separate thread
screen.log_title("Running test {}".format(preset_name))
log_file_name = 'test_log_{preset_name}.txt'.format(preset_name=preset_name)
cmd = (
'python3 rl_coach/coach.py '
'-p {preset_name} '
'-e {test_name} '
'-n {num_workers} '
'--seed 0 '
'-c '
'{level} '
'&> {log_file_name} '
).format(
preset_name=preset_name,
test_name=test_name,
num_workers=preset_validation_params.num_workers,
log_file_name=log_file_name,
level='-lvl ' + preset_validation_params.reward_test_level if preset_validation_params.reward_test_level else ''
)
cmd = [
'python3',
'rl_coach/coach.py',
'-p', '{preset_name}'.format(preset_name=preset_name),
'-e', '{test_name}'.format(test_name=test_name),
'-n', '{num_workers}'.format(num_workers=preset_validation_params.num_workers),
'--seed', '0',
'-c'
]
if preset_validation_params.reward_test_level:
cmd += ['-lvl', '{level}'.format(level=preset_validation_params.reward_test_level)]
p = subprocess.Popen(cmd, shell=True, executable="/bin/bash", preexec_fn=os.setsid)
stdout = open(log_file_name, 'w')
p = subprocess.Popen(cmd, stdout=stdout, stderr=stdout)
start_time = time.time()
@@ -148,7 +145,8 @@ def perform_reward_based_tests(args, preset_validation_params, preset_name):
time.sleep(1)
# kill test and print result
os.killpg(os.getpgid(p.pid), signal.SIGTERM)
# os.killpg(os.getpgid(p.pid), signal.SIGKILL)
p.kill()
screen.log('')
if test_passed:
screen.success("Passed successfully")

View File

@@ -131,7 +131,10 @@ def wait_and_check(args, processes, force=False):
os.makedirs(trace_path)
df = pd.read_csv(csv_paths[0])
df = clean_df(df)
df.to_csv(os.path.join(trace_path, 'trace.csv'), index=False)
try:
df.to_csv(os.path.join(trace_path, 'trace.csv'), index=False)
except:
pass
screen.success("Successfully created new trace.")
test_passed = True
else: