* SAC algorithm
* SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train.
gym_environment - fixing an error in access to gym.spaces
* Soft Actor Critic - code cleanup
* code cleanup
* V-head initialization fix
* SAC benchmarks
* SAC Documentation
* typo fix
* documentation fixes
* documentation and version update
* README typo
* tests: stabling CI
* tests: fix failed tests - stabling CI
* fix get csv files.
- fixed seed test
* fix clres on conftest - now can modify paths during test run.
- this fixed the mxnet checkpoint test
* tests: fix comments
* fix for fetch rendering - removing code which was once required with older gym versions. images are now rendered correctly by default with the latest gym.
* fixing mujoco camera id failure
* tests: new mxnet test + fix utils
new test added:
- test_restore_checkpoint[tensorflow, mxnet]
fix failed tests in CI
improve utils
* tests: fix comments for mxnet checkpoint test and utils
* introduce dockerfiles.
* ensure golden tests are run not just collected.
* Skip CI download of dockerfiles.
* add StarCraft environment and tests.
* add minimaps starcraft validation parameters.
* Add functional test running (from Ayoob)
* pin mujoco_py version to a 1.5 compatible release.
* fix config syntax issue.
* pin remaining mujoco_py install calls.
* Relax pin of gym version in gym Dockerfile.
* update makefile based on functional test filtering.
* integration test changes to override heatup to 1000 steps + run each preset for 30 sec (to make sure we reach the train part)
* fixes to failing presets uncovered with this change + changes in the golden testing to properly test BatchRL
* fix for rainbow dqn
* fix to gym_environment (due to a change in Gym 0.12.1) + fix for rainbow DQN + some bug-fix in utils.squeeze_list
* fix for NEC agent
allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups
* initial ACER commit
* Code cleanup + several fixes
* Q-retrace bug fix + small clean-ups
* added documentation for acer
* ACER benchmarks
* update benchmarks table
* Add nightly running of golden and trace tests. (#202)
Resolves#200
* comment out nightly trace tests until values reset.
* remove redundant observe ignore (#168)
* ensure nightly test env containers exist. (#205)
Also bump integration test timeout
* wxPython removal (#207)
Replacing wxPython with Python's Tkinter.
Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.
* Create CONTRIBUTING.md (#210)
* Create CONTRIBUTING.md. Resolves#188
* run nightly golden tests sequentially. (#217)
Should reduce resource requirements and potential CPU contention but increases
overall execution time.
* tests: added new setup configuration + test args (#211)
- added utils for future tests and conftest
- added test args
* new docs build
* golden test update
Replacing wxPython with Python's Tkinter.
Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.
* add additional info during exception of eks runs.
* ensure we refresh k8s config after long calls.
Kubernetes client on EKS has a 10 minute token time to live, so will
result in unauthorized errors if tokens are not refreshed on long jobs.
ISSUE: When we restore checkpoints, we create new nodes in the
Tensorflow graph. This happens when we assign new value (op node) to
RefVariable in GlobalVariableSaver. With every restore the size of TF
graph increases as new nodes are created and old unused nodes are not
removed from the graph. This causes the memory leak in
restore_checkpoint codepath.
FIX: We use TF placeholder to update the variables which avoids the
memory leak.