coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-07-07 18:06:31 +02:00

Author	SHA1	Message	Date
Guy Jacob	235a259223	Add Flatten layer to architectures + make flatten optional in embedders (#483 ) Flatten layer required for embedders that mix conv and dense (Cherry picking from #478)	2021-05-12 11:11:10 +03:00
Gal Leibovich	138ced23ba	RL in Large Discrete Action Spaces - Wolpertinger Agent (#394 ) * Currently this is specific to the case of discretizing a continuous action space. Can easily be adapted to other case by feeding the kNN otherwise, and removing the usage of a discretizing output action filter	2019-09-08 12:53:49 +03:00
Zach Dwiel	7b0fccb041	Add RedisDataStore (#295 ) * GraphManager.set_session also sets self.sess * make sure that GraphManager.fetch_from_worker uses training phase * remove unnecessary phase setting in training worker * reorganize rollout worker * provide default name to GlobalVariableSaver.__init__ since it isn't really used anyway * allow dividing TrainingSteps and EnvironmentSteps * add timestamps to the log * added redis data store * conflict merge fix	2019-08-28 21:15:58 +03:00
Gal Leibovich	19ad2d60a7	Batch RL Tutorial (#372 )	2019-07-14 18:43:48 +03:00
Gal Leibovich	d6795bd524	batchnorm fixes + disabling batchnorm in DDPG (#353 ) Co-authored-by: James Casbon <casbon+gh@gmail.com>	2019-06-23 11:28:22 +03:00
Gal Leibovich	7eb884c5b2	TD3 (#338 )	2019-06-16 11:11:21 +03:00
Gal Leibovich	a1bb8eef89	DDPG Critic Head Bug Fix (#344 ) * A bug fix for DDPG, where the update to the policy network was based on the sum of the critic's Q predictions on the batch instead of their mean	2019-06-05 17:47:56 +03:00
James Casbon	2b7d536da4	Add head regularization costs to tf.losses (#292 )	2019-05-26 17:15:42 +03:00
Gal Novik	aa9f3cefaf	Printing input size as part of network summary (#310 )	2019-05-12 15:40:02 +03:00
guyk1971	74db141d5e	SAC algorithm (#282 ) * SAC algorithm * SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train. gym_environment - fixing an error in access to gym.spaces * Soft Actor Critic - code cleanup * code cleanup * V-head initialization fix * SAC benchmarks * SAC Documentation * typo fix * documentation fixes * documentation and version update * README typo	2019-05-01 18:37:49 +03:00
Gal Leibovich	4741b0b916	BCQ variant on top of DDQN (#276 ) * kNN based model for predicting which actions to drop * fix for seeds with batch rl	2019-04-16 17:06:23 +03:00
Federico Andres Lois	bdb9b224a8	Include missing RegressionHead. (#263 )	2019-04-16 15:24:06 +03:00
Zach Dwiel	2291cee2c6	allow serializing from/to arrays/str from GlobalVariableSaver (#285 )	2019-04-04 11:09:19 -04:00
Gal Leibovich	6e08c55ad5	Enabling-more-agents-for-Batch-RL-and-cleanup (#258 ) allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups	2019-03-21 16:10:29 +02:00
Gal Leibovich	abec59f367	fixes to rainbow dqn + a cartpole based golden test (#253 )	2019-03-21 12:57:56 +02:00
Gal Leibovich	e3c7e526c7	Batch RL (#238 )	2019-03-19 18:07:09 +02:00
Gal Leibovich	d6158a5cfc	restoring from a checkpoint file (#247 )	2019-03-17 16:28:09 +02:00
Gal Leibovich	9a895a1ac7	bug-fix for l2_regularization not in use (#230 ) * bug-fix for l2_regularization not in use * removing not in use TF REGULARIZATION_LOSSES collection	2019-03-03 15:11:06 +02:00
shadiendrawis	2b5d1dabe6	ACER algorithm (#184 ) * initial ACER commit * Code cleanup + several fixes * Q-retrace bug fix + small clean-ups * added documentation for acer * ACER benchmarks * update benchmarks table * Add nightly running of golden and trace tests. (#202) Resolves #200 * comment out nightly trace tests until values reset. * remove redundant observe ignore (#168) * ensure nightly test env containers exist. (#205) Also bump integration test timeout * wxPython removal (#207) Replacing wxPython with Python's Tkinter. Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner. * Create CONTRIBUTING.md (#210) * Create CONTRIBUTING.md. Resolves #188 * run nightly golden tests sequentially. (#217) Should reduce resource requirements and potential CPU contention but increases overall execution time. * tests: added new setup configuration + test args (#211) - added utils for future tests and conftest - added test args * new docs build * golden test update	2019-02-20 23:52:34 +02:00
Zach Dwiel	fedb4cbd7c	Cleanup and refactoring (#171 )	2019-01-15 10:04:53 +02:00
Gourav Roy	b1e9ea48d8	Refactored GlobalVariableSaver	2019-01-03 15:08:34 -08:00
Gourav Roy	619ea0944e	Avoid Memory Leak in Rollout worker ISSUE: When we restore checkpoints, we create new nodes in the Tensorflow graph. This happens when we assign new value (op node) to RefVariable in GlobalVariableSaver. With every restore the size of TF graph increases as new nodes are created and old unused nodes are not removed from the graph. This causes the memory leak in restore_checkpoint codepath. FIX: We use TF placeholder to update the variables which avoids the memory leak.	2019-01-02 23:09:09 -08:00
Zach Dwiel	d0248e03c6	add meaningful error message in the event that the action space is not one that can be used (#151 )	2018-12-11 09:09:24 +02:00
Ryan Peach	9e66bb653e	Enable creating custom tensorflow heads, embedders, and middleware. (#135 ) Allowing components to have a path property.	2018-12-05 11:40:06 +02:00
Gal Novik	fc6604c09c	added missing license headers	2018-11-27 22:43:40 +02:00
Gal Leibovich	11170d5ba3	fix dist. tf (#153 )	2018-11-25 14:02:24 +02:00
Gal Leibovich	a1c56edd98	Fixes for having NumpySharedRunningStats syncing on multi-node (#139 ) 1. Having the standard checkpoint prefix in order for the data store to grab it, and sync it to S3. 2. Removing the reference to Redis so that it won't try to pickle that in. 3. Enable restoring a checkpoint into a single-worker run, which was saved by a single-node-multiple-worker run.	2018-11-23 16:11:47 +02:00
Sina Afrooze	87a7848b0a	Moved tf.variable_scope and tf.device calls to framework-specific architecture (#136 )	2018-11-22 22:52:21 +02:00
shadiendrawis	559969d3dd	disabled loading for target weights (#138 ) * Update savers.py * disabled loading for target weights	2018-11-22 18:15:52 +02:00
Sina Afrooze	16cdd9a9c1	Tf checkpointing using saver mechanism (#134 )	2018-11-22 14:08:10 +02:00
shadiendrawis	b94239234a	Removed TF warning when training in a distributed setting (#133 ) * removed TF warning when training in a distributed setting and changed package version * revert version back to 0.11.0	2018-11-21 16:09:04 +02:00
Gal Leibovich	a112ee69f6	Save filters' internal state (#127 ) * save filters internal state * moving the restore to be made from within NumpyRunningStats	2018-11-20 17:21:48 +02:00
Sina Afrooze	67eb9e4c28	Adding checkpointing framework (#74 ) * Adding checkpointing framework as well as mxnet checkpointing implementation. - MXNet checkpoint for each network is saved in a separate file. * Adding checkpoint restore for mxnet to graph-manager * Add unit-test for get_checkpoint_state() * Added match.group() to fix unit-test failing on CI * Added ONNX export support for MXNet	2018-11-19 19:45:49 +02:00
Sina Afrooze	67a90ee87e	Add tensor input type for arbitrary dimensional observation (#125 ) * Allow arbitrary dimensional observation (non vector or image) * Added creating PlanarMapsObservationSpace to GymEnvironment when number of channels is not 1 or 3	2018-11-19 16:41:12 +02:00
Gal Leibovich	6caf721d1c	Numpy shared running stats (#97 )	2018-11-18 14:46:40 +02:00
Gal Leibovich	449bcfb4e1	summing head losses instead of taking the mean (#98 )	2018-11-18 12:20:00 +02:00
Itai Caspi	6d40ad1650	update of api docstrings across coach and tutorials [WIP] (#91 ) * updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation	2018-11-15 15:00:13 +02:00
Balaji Subramaniam	a849c17e46	Enable distributed SharedRunningStats (#81 ) - Use Redis pub/sub for updating SharedRunningStats.	2018-11-13 19:17:38 +02:00
Itai Caspi	3fd433ffab	fix ddpg head (#78 )	2018-11-09 08:17:04 -08:00
Itai Caspi	3a0a1159e9	fixing the dropout rate code (#72 ) addresses issue #53	2018-11-08 16:53:47 +02:00
Itai Caspi	83e0b09a6a	adding the missing export_onnx_graph parameter to task parameters (#73 )	2018-11-08 12:52:42 +02:00
Sina Afrooze	5fadb9c18e	Adding mxnet components to rl_coach/architectures (#60 ) Adding mxnet components to rl_coach architectures. - Supports PPO and DQN - Tested with CartPole_PPO and CarPole_DQN - Normalizing filters don't work right now (see #49) and are disabled in CartPole_PPO preset - Checkpointing is disabled for MXNet	2018-11-07 17:07:15 +02:00
Itai Caspi	e7a91b4dc3	Fix cmd line arguments handling (#68 ) * refactoring the merging of the task parameters and the command line parameters * removing some unused command line arguments * fix for saving checkpoints when not passing through coach.py	2018-11-07 15:47:02 +02:00
Sina Afrooze	93571306c3	Removed tensorflow specific code in presets (#59 ) * Add generic layer specification for using in presets * Modify presets to use the generic scheme	2018-11-06 17:39:29 +02:00
Itai Caspi	811152126c	Export graph to ONNX (#61 ) Implements the ONNX graph exporting feature. Currently does not work for NAF, C51 and A3C_LSTM due to unsupported TF layers in the tf2onnx library.	2018-11-06 10:55:21 +02:00
Sina Afrooze	2046358ab0	Add docstring for architecture (#47 ) - Removed get_model() from architecture because it is only implementation detail of architecture.	2018-10-30 11:02:37 +02:00
Sina Afrooze	a888226641	Move embedder, middleware, and head parameters to framework agnostic modules. (#45 ) Part of #28	2018-10-29 14:46:40 -07:00
Zach Dwiel	700a175902	rename save_checkpoint_secs -> checkpoint_save_secs	2018-10-23 17:10:58 -04:00
Shadi Endrawis	51726a5b80	network_imporvements branch merge	2018-10-02 13:43:36 +03:00
Gal Leibovich	72ea933384	bug-fix for clipped_ppo not logging several signals + small cleanup	2018-10-02 14:22:37 +03:00

1 2

61 Commits