* initial ACER commit
* Code cleanup + several fixes
* Q-retrace bug fix + small clean-ups
* added documentation for acer
* ACER benchmarks
* update benchmarks table
* Add nightly running of golden and trace tests. (#202)
Resolves#200
* comment out nightly trace tests until values reset.
* remove redundant observe ignore (#168)
* ensure nightly test env containers exist. (#205)
Also bump integration test timeout
* wxPython removal (#207)
Replacing wxPython with Python's Tkinter.
Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.
* Create CONTRIBUTING.md (#210)
* Create CONTRIBUTING.md. Resolves#188
* run nightly golden tests sequentially. (#217)
Should reduce resource requirements and potential CPU contention but increases
overall execution time.
* tests: added new setup configuration + test args (#211)
- added utils for future tests and conftest
- added test args
* new docs build
* golden test update
Replacing wxPython with Python's Tkinter.
Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.
* add additional info during exception of eks runs.
* ensure we refresh k8s config after long calls.
Kubernetes client on EKS has a 10 minute token time to live, so will
result in unauthorized errors if tokens are not refreshed on long jobs.
ISSUE: When we restore checkpoints, we create new nodes in the
Tensorflow graph. This happens when we assign new value (op node) to
RefVariable in GlobalVariableSaver. With every restore the size of TF
graph increases as new nodes are created and old unused nodes are not
removed from the graph. This causes the memory leak in
restore_checkpoint codepath.
FIX: We use TF placeholder to update the variables which avoids the
memory leak.
ISSUE: When we restore checkpoints, we create new nodes in the
Tensorflow graph. This happens when we assign new value (op node) to
RefVariable in GlobalVariableSaver. With every restore the size of TF
graph increases as new nodes are created and old unused nodes are not
removed from the graph. This causes the memory leak in
restore_checkpoint codepath.
FIX: We reset the Tensorflow graph and recreate the Global, Online and
Target networks on every restore. This ensures that the old unused nodes
in TF graph is dropped.
During heatup we may want to add agent-generated-noise (i.e. not "simple" random noise).
This is enabled by setting 'heatup_using_network_decisions' to True. For example:
agent_params = DDPGAgentParameters()
agent_params.algorithm.heatup_using_network_decisions = True
The fix ensures that the correct noise is added not just while in the TRAINING phase, but
also during the HEATUP phase.
No one has enabled 'heatup_using_network_decisions' yet, which explains why this problem
arose only now (in my configuration I do enable 'heatup_using_network_decisions').
Currently in rollout worker, we call restore_checkpoint repeatedly to load the latest model in memory. The restore checkpoint functions calls checkpoint_saver. Checkpoint saver uses GlobalVariablesSaver which does not release the references of the previous model variables. This leads to the situation where the memory keeps on growing before crashing the rollout worker.
This change avoid using the checkpoint saver in the rollout worker as I believe it is not needed in this code path.
Also added a test to easily reproduce the issue using CartPole example. We were also seeing this issue with the AWS DeepRacer implementation and the current implementation avoid the memory leak there as well.