coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-11 19:25:53 +01:00

Author	SHA1	Message	Date
Gal Leibovich	6e08c55ad5	Enabling-more-agents-for-Batch-RL-and-cleanup (#258 ) allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups	2019-03-21 16:10:29 +02:00
Gal Leibovich	abec59f367	fixes to rainbow dqn + a cartpole based golden test (#253 )	2019-03-21 12:57:56 +02:00
Gal Leibovich	e3c7e526c7	Batch RL (#238 )	2019-03-19 18:07:09 +02:00
Gal Leibovich	d6158a5cfc	restoring from a checkpoint file (#247 )	2019-03-17 16:28:09 +02:00
Gal Leibovich	9a895a1ac7	bug-fix for l2_regularization not in use (#230 ) * bug-fix for l2_regularization not in use * removing not in use TF REGULARIZATION_LOSSES collection	2019-03-03 15:11:06 +02:00
shadiendrawis	2b5d1dabe6	ACER algorithm (#184 ) * initial ACER commit * Code cleanup + several fixes * Q-retrace bug fix + small clean-ups * added documentation for acer * ACER benchmarks * update benchmarks table * Add nightly running of golden and trace tests. (#202) Resolves #200 * comment out nightly trace tests until values reset. * remove redundant observe ignore (#168) * ensure nightly test env containers exist. (#205) Also bump integration test timeout * wxPython removal (#207) Replacing wxPython with Python's Tkinter. Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner. * Create CONTRIBUTING.md (#210) * Create CONTRIBUTING.md. Resolves #188 * run nightly golden tests sequentially. (#217) Should reduce resource requirements and potential CPU contention but increases overall execution time. * tests: added new setup configuration + test args (#211) - added utils for future tests and conftest - added test args * new docs build * golden test update	2019-02-20 23:52:34 +02:00
Zach Dwiel	fedb4cbd7c	Cleanup and refactoring (#171 )	2019-01-15 10:04:53 +02:00
Gourav Roy	b1e9ea48d8	Refactored GlobalVariableSaver	2019-01-03 15:08:34 -08:00
Gourav Roy	619ea0944e	Avoid Memory Leak in Rollout worker ISSUE: When we restore checkpoints, we create new nodes in the Tensorflow graph. This happens when we assign new value (op node) to RefVariable in GlobalVariableSaver. With every restore the size of TF graph increases as new nodes are created and old unused nodes are not removed from the graph. This causes the memory leak in restore_checkpoint codepath. FIX: We use TF placeholder to update the variables which avoids the memory leak.	2019-01-02 23:09:09 -08:00
Zach Dwiel	d0248e03c6	add meaningful error message in the event that the action space is not one that can be used (#151 )	2018-12-11 09:09:24 +02:00
Ryan Peach	9e66bb653e	Enable creating custom tensorflow heads, embedders, and middleware. (#135 ) Allowing components to have a path property.	2018-12-05 11:40:06 +02:00
Gal Novik	fc6604c09c	added missing license headers	2018-11-27 22:43:40 +02:00
Gal Leibovich	11170d5ba3	fix dist. tf (#153 )	2018-11-25 14:02:24 +02:00
Gal Leibovich	a1c56edd98	Fixes for having NumpySharedRunningStats syncing on multi-node (#139 ) 1. Having the standard checkpoint prefix in order for the data store to grab it, and sync it to S3. 2. Removing the reference to Redis so that it won't try to pickle that in. 3. Enable restoring a checkpoint into a single-worker run, which was saved by a single-node-multiple-worker run.	2018-11-23 16:11:47 +02:00
Sina Afrooze	87a7848b0a	Moved tf.variable_scope and tf.device calls to framework-specific architecture (#136 )	2018-11-22 22:52:21 +02:00
shadiendrawis	559969d3dd	disabled loading for target weights (#138 ) * Update savers.py * disabled loading for target weights	2018-11-22 18:15:52 +02:00
Sina Afrooze	16cdd9a9c1	Tf checkpointing using saver mechanism (#134 )	2018-11-22 14:08:10 +02:00
shadiendrawis	b94239234a	Removed TF warning when training in a distributed setting (#133 ) * removed TF warning when training in a distributed setting and changed package version * revert version back to 0.11.0	2018-11-21 16:09:04 +02:00
Gal Leibovich	a112ee69f6	Save filters' internal state (#127 ) * save filters internal state * moving the restore to be made from within NumpyRunningStats	2018-11-20 17:21:48 +02:00
Sina Afrooze	67eb9e4c28	Adding checkpointing framework (#74 ) * Adding checkpointing framework as well as mxnet checkpointing implementation. - MXNet checkpoint for each network is saved in a separate file. * Adding checkpoint restore for mxnet to graph-manager * Add unit-test for get_checkpoint_state() * Added match.group() to fix unit-test failing on CI * Added ONNX export support for MXNet	2018-11-19 19:45:49 +02:00
Sina Afrooze	67a90ee87e	Add tensor input type for arbitrary dimensional observation (#125 ) * Allow arbitrary dimensional observation (non vector or image) * Added creating PlanarMapsObservationSpace to GymEnvironment when number of channels is not 1 or 3	2018-11-19 16:41:12 +02:00
Gal Leibovich	6caf721d1c	Numpy shared running stats (#97 )	2018-11-18 14:46:40 +02:00
Gal Leibovich	449bcfb4e1	summing head losses instead of taking the mean (#98 )	2018-11-18 12:20:00 +02:00
Itai Caspi	6d40ad1650	update of api docstrings across coach and tutorials [WIP] (#91 ) * updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation	2018-11-15 15:00:13 +02:00
Balaji Subramaniam	a849c17e46	Enable distributed SharedRunningStats (#81 ) - Use Redis pub/sub for updating SharedRunningStats.	2018-11-13 19:17:38 +02:00
Itai Caspi	3fd433ffab	fix ddpg head (#78 )	2018-11-09 08:17:04 -08:00
Itai Caspi	3a0a1159e9	fixing the dropout rate code (#72 ) addresses issue #53	2018-11-08 16:53:47 +02:00
Itai Caspi	83e0b09a6a	adding the missing export_onnx_graph parameter to task parameters (#73 )	2018-11-08 12:52:42 +02:00
Sina Afrooze	5fadb9c18e	Adding mxnet components to rl_coach/architectures (#60 ) Adding mxnet components to rl_coach architectures. - Supports PPO and DQN - Tested with CartPole_PPO and CarPole_DQN - Normalizing filters don't work right now (see #49) and are disabled in CartPole_PPO preset - Checkpointing is disabled for MXNet	2018-11-07 17:07:15 +02:00
Itai Caspi	e7a91b4dc3	Fix cmd line arguments handling (#68 ) * refactoring the merging of the task parameters and the command line parameters * removing some unused command line arguments * fix for saving checkpoints when not passing through coach.py	2018-11-07 15:47:02 +02:00
Sina Afrooze	93571306c3	Removed tensorflow specific code in presets (#59 ) * Add generic layer specification for using in presets * Modify presets to use the generic scheme	2018-11-06 17:39:29 +02:00
Itai Caspi	811152126c	Export graph to ONNX (#61 ) Implements the ONNX graph exporting feature. Currently does not work for NAF, C51 and A3C_LSTM due to unsupported TF layers in the tf2onnx library.	2018-11-06 10:55:21 +02:00
Sina Afrooze	2046358ab0	Add docstring for architecture (#47 ) - Removed get_model() from architecture because it is only implementation detail of architecture.	2018-10-30 11:02:37 +02:00
Sina Afrooze	a888226641	Move embedder, middleware, and head parameters to framework agnostic modules. (#45 ) Part of #28	2018-10-29 14:46:40 -07:00
Zach Dwiel	700a175902	rename save_checkpoint_secs -> checkpoint_save_secs	2018-10-23 17:10:58 -04:00
Shadi Endrawis	51726a5b80	network_imporvements branch merge	2018-10-02 13:43:36 +03:00
Gal Leibovich	72ea933384	bug-fix for clipped_ppo not logging several signals + small cleanup	2018-10-02 14:22:37 +03:00
itaicaspi-intel	d3f97cd93b	initial CIL implementation (WIP)	2018-09-13 15:29:29 +03:00
itaicaspi-intel	a9bd1047c4	load and save function for non-episodic replay buffers + carla improvements + network bug fixes	2018-09-12 15:26:16 +03:00
Itai Caspi	72a1d9d426	Itaicaspi/episode reset refactoring (#105 ) * reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file	2018-09-04 15:07:54 +03:00
itaicaspi-intel	2c62a40466	bug fix in dueling network + revert to TF 1.6 for CPU due to requirements compatibility issues	2018-09-02 13:38:16 +03:00
Gal Leibovich	ebe574e463	add missing hidden layer in rainbow_q_head	2018-08-30 19:34:27 +03:00
Gal Leibovich	ea294de7fd	adding dueling support for rainbow dqn (now only missing n-step)	2018-08-30 18:15:59 +03:00
Gal Leibovich	d2623c0eee	bug-fix in dueling dqn	2018-08-30 18:14:53 +03:00
Gal Leibovich	bbe7ac3338	Rainbow DQN agent (WIP - still missing dueling and n-step) + adding support for Prioritized ER for C51	2018-08-30 18:14:53 +03:00
Gal Leibovich	1aa2ab0590	parameter noise exploration - using Noisy Nets	2018-08-27 18:19:01 +03:00
itaicaspi-intel	658b437079	removing datasets + imports optimization	2018-08-27 10:54:11 +03:00
Gal Novik	19ca5c24b1	pre-release 0.10.0	2018-08-13 17:11:34 +03:00

48 Commits