1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 03:30:19 +01:00

298 Commits

Author SHA1 Message Date
Neta Zmora
b4bc8a476c Bug fix: when enabling 'heatup_using_network_decisions', we should add the configured noise (#162)
During heatup we may want to add agent-generated-noise (i.e. not "simple" random noise).
This is enabled by setting 'heatup_using_network_decisions' to True.  For example:
	agent_params = DDPGAgentParameters()
	agent_params.algorithm.heatup_using_network_decisions = True

The fix ensures that the correct noise is added not just while in the TRAINING phase, but
also during the HEATUP phase.

No one has enabled 'heatup_using_network_decisions' yet, which explains why this problem
arose only now (in my configuration I do enable 'heatup_using_network_decisions').
2018-12-17 10:08:54 +02:00
gouravr
b8d21c73bf comment out the part of test in 'test_basic_rl_graph_manager_with_cartpole_dqn_and_repeated_checkpoint_restore' that run in infinite loop 2018-12-16 10:56:40 -08:00
x77a1
1f0980c448 Merge branch 'master' into master 2018-12-16 09:37:00 -08:00
Gal Leibovich
f9ee526536 Fix for issue #128 - circular DQN import (#130) 2018-12-16 16:06:44 +02:00
gouravr
801aed5e10 Changes to avoid memory leak in rollout worker
Currently in rollout worker, we call restore_checkpoint repeatedly to load the latest model in memory. The restore checkpoint functions calls checkpoint_saver. Checkpoint saver uses GlobalVariablesSaver which does not release the references of the previous model variables. This leads to the situation where the memory keeps on growing before crashing the rollout worker.

This change avoid using the checkpoint saver in the rollout worker as I believe it is not needed in this code path.

Also added a test to easily reproduce the issue using CartPole example. We were also seeing this issue with the AWS DeepRacer implementation and the current implementation avoid the memory leak there as well.
2018-12-15 12:26:31 -08:00
zach dwiel
e08accdc22 allow case insensitive selected level name matching 2018-12-11 12:35:30 -05:00
Zach Dwiel
d0248e03c6 add meaningful error message in the event that the action space is not one that can be used (#151) 2018-12-11 09:09:24 +02:00
Gal Leibovich
f12857a8c7 Docs changes - fixing blogpost links, removing importing all exploration policies (#139)
* updated docs

* removing imports for all exploration policies in __init__ + setting the right blog-post link

* small cleanups
2018-12-05 16:16:16 -05:00
Sina Afrooze
155b78b995 Fix warning on import TF or MxNet, when only one of the frameworks is installed (#140) 2018-12-05 11:52:24 +02:00
Ryan Peach
9e66bb653e Enable creating custom tensorflow heads, embedders, and middleware. (#135)
Allowing components to have a path property.
2018-12-05 11:40:06 +02:00
Ryan Peach
3c58ed740b 'CompositeAgent' object has no attribute 'handle_episode_ended' (#136) 2018-12-05 11:28:16 +02:00
Ryan Peach
436b16016e Added num_transitions to Memory interface (#137) 2018-12-05 10:33:25 +02:00
Ryan Peach
28e5b8b612 Minor bugfix on RewardFilter in Readme (#133) 2018-11-30 16:02:08 -08:00
Gal Novik
fc6604c09c added missing license headers 2018-11-27 22:43:40 +02:00
Balaji Subramaniam
d06197f663 Add documentation on distributed Coach. (#158)
* Added documentation on distributed Coach.
2018-11-27 12:26:15 +02:00
Gal Leibovich
5674749ed5 workaround for resolving the issue of restoring a multi-node training checkpoint to single worker (#156) 2018-11-26 00:08:43 +02:00
Gal Leibovich
ab10852ad9 hacky way to resolve the checkpointing issue (#154) 2018-11-25 16:14:15 +02:00
Gal Leibovich
11170d5ba3 fix dist. tf (#153) 2018-11-25 14:02:24 +02:00
Sina Afrooze
19a68812f6 Added ONNX compatible broadcast_like function (#152)
- Also simplified the hybrid_clip implementation.
2018-11-25 11:23:18 +02:00
Balaji Subramaniam
8df425b6e1 Update how save checkpoint secs arg is handled in distributed Coach. (#151) 2018-11-25 00:05:24 -08:00
Thom Lane
de9b707fe1 Changed run_multiple_seeds to support mxnet. And fix other bugs. (#122) 2018-11-25 08:33:09 +02:00
Sina Afrooze
77fb561668 Added code to fall back to CPU if GPU not available. (#150)
- Code will also prune GPU list if more than available GPUs is requested.
2018-11-25 08:32:26 +02:00
Sina Afrooze
7d25477942 Add observation_space_type to GymEnvironmentParameters so that it is possible to explicitly state that in presets. (#145) 2018-11-25 07:11:48 +02:00
Balaji Subramaniam
bf2036b284 S3 optimization - save only the latest checkpoint. (#148) 2018-11-23 22:17:36 -08:00
Balaji Subramaniam
13d2679af4 Sync experiment dir, videos, gifs to S3. (#147) 2018-11-23 20:52:12 -08:00
Sina Afrooze
5332013bd1 Implement frame-work agnostic rollout and training workers (#137)
* Added checkpoint state file to coach checkpointing.

* Removed TF specific code from rollout_worker, training_worker, and s3_data_store
2018-11-23 18:05:44 -08:00
Ajay Deshpande
4a6c404070 Adding worker logs and plumbed task_parameters to distributed coach (#130) 2018-11-23 15:35:11 -08:00
Gal Leibovich
2b4c9c6774 Removing grarph_manager param (#141) 2018-11-23 11:42:54 -08:00
Gal Leibovich
a1c56edd98 Fixes for having NumpySharedRunningStats syncing on multi-node (#139)
1. Having the standard checkpoint prefix in order for the data store to grab it, and sync it to S3.
2. Removing the reference to Redis so that it won't try to pickle that in.
3. Enable restoring a checkpoint into a single-worker run, which was saved by a single-node-multiple-worker run.
2018-11-23 16:11:47 +02:00
Sina Afrooze
87a7848b0a Moved tf.variable_scope and tf.device calls to framework-specific architecture (#136) 2018-11-22 22:52:21 +02:00
shadiendrawis
559969d3dd disabled loading for target weights (#138)
* Update savers.py

* disabled loading for target weights
2018-11-22 18:15:52 +02:00
Thom Lane
949d91321a Added explicit environment closing (#129) 2018-11-22 14:25:03 +02:00
Sina Afrooze
16cdd9a9c1 Tf checkpointing using saver mechanism (#134) 2018-11-22 14:08:10 +02:00
Cody Hsieh
dd18959e53 Don't download when checkpoint files are already present (#109)
* add check if checkpoint file present
2018-11-21 15:32:53 -08:00
shadiendrawis
b94239234a Removed TF warning when training in a distributed setting (#133)
* removed TF warning when training in a distributed setting and changed package version

* revert version back to 0.11.0
2018-11-21 16:09:04 +02:00
Gal Leibovich
a112ee69f6 Save filters' internal state (#127)
* save filters internal state

* moving the restore to be made from within NumpyRunningStats
2018-11-20 17:21:48 +02:00
Sina Afrooze
67eb9e4c28 Adding checkpointing framework (#74)
* Adding checkpointing framework as well as mxnet checkpointing implementation.

- MXNet checkpoint for each network is saved in a separate file.

* Adding checkpoint restore for mxnet to graph-manager

* Add unit-test for get_checkpoint_state()

* Added match.group() to fix unit-test failing on CI

* Added ONNX export support for MXNet
2018-11-19 19:45:49 +02:00
x77a1
4da56b1ff2 Enable setting the data store factory in Graph manager (#110)
* Enable setting the data store factory in Graph manager

This change enables us to use custom data store for storing and retrieving models.
We currently need this to have use a data store that loads temporary AWS credentials
from disk before calling store or load operations.

* Removed data store factory and introduced data store as a attribute
2018-11-19 08:35:03 -08:00
Sina Afrooze
67a90ee87e Add tensor input type for arbitrary dimensional observation (#125)
* Allow arbitrary dimensional observation (non vector or image)
* Added creating PlanarMapsObservationSpace to GymEnvironment when number of channels is not 1 or 3
2018-11-19 16:41:12 +02:00
Thom Lane
7ba1a4393f Channel order transpose, for image embedder. Updated unit test. (#87) 2018-11-19 15:39:03 +02:00
Thom Lane
9210909050 Added MXNet to arg docs. (#121) 2018-11-19 11:31:28 +02:00
Gal Leibovich
d4d06aaea6 remove kubernetes dependency (#117) 2018-11-18 18:10:22 +02:00
Gal Leibovich
430e286c56 muting pygame's hello message (#116) 2018-11-18 18:02:55 +02:00
Gal Leibovich
ce85c8e8c3 Removing Egreedy from CartPole_ClippedPPO. ClippedPPO's default exploration policy is to be used instead. (#115) 2018-11-18 16:36:34 +02:00
Gal Leibovich
6caf721d1c Numpy shared running stats (#97) 2018-11-18 14:46:40 +02:00
Gal Novik
e1fa6e9681 roboschool: updating envs to v1, fixing rendering (#112) 2018-11-18 13:38:10 +02:00
Gal Leibovich
9fd4d55623 Making stop condition optional by using a flag (#113)
* apply stop condition flag (default: ignore the stop condition)
2018-11-18 13:37:39 +02:00
Gal Leibovich
449bcfb4e1 summing head losses instead of taking the mean (#98) 2018-11-18 12:20:00 +02:00
Balaji Subramaniam
dea1826658 Re-enable NFS data store. (#101) 2018-11-16 13:55:33 -08:00
Thom Lane
a0f25034c3 Added average total reward to logging after evaluation phase completes. (#93) 2018-11-16 08:22:00 -08:00