coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-07-08 18:36:32 +02:00

Author	SHA1	Message	Date
Gal Leibovich	310d31c227	integration test changes to reach the train part (#254 ) * integration test changes to override heatup to 1000 steps + run each preset for 30 sec (to make sure we reach the train part) * fixes to failing presets uncovered with this change + changes in the golden testing to properly test BatchRL * fix for rainbow dqn * fix to gym_environment (due to a change in Gym 0.12.1) + fix for rainbow DQN + some bug-fix in utils.squeeze_list * fix for NEC agent	2019-03-27 21:14:19 +02:00
Gal Leibovich	6e08c55ad5	Enabling-more-agents-for-Batch-RL-and-cleanup (#258 ) allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups	2019-03-21 16:10:29 +02:00
Gal Leibovich	abec59f367	fixes to rainbow dqn + a cartpole based golden test (#253 )	2019-03-21 12:57:56 +02:00
Gal Leibovich	e3c7e526c7	Batch RL (#238 )	2019-03-19 18:07:09 +02:00
shadiendrawis	f03bd7ad93	benchmark update (#250 )	2019-03-17 15:33:28 +02:00
Gal Novik	10220be9be	Adding support for evaluation only mode with predefined number of steps (#225 )	2019-03-03 10:03:45 +02:00
shadiendrawis	2b5d1dabe6	ACER algorithm (#184 ) * initial ACER commit * Code cleanup + several fixes * Q-retrace bug fix + small clean-ups * added documentation for acer * ACER benchmarks * update benchmarks table * Add nightly running of golden and trace tests. (#202) Resolves #200 * comment out nightly trace tests until values reset. * remove redundant observe ignore (#168) * ensure nightly test env containers exist. (#205) Also bump integration test timeout * wxPython removal (#207) Replacing wxPython with Python's Tkinter. Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner. * Create CONTRIBUTING.md (#210) * Create CONTRIBUTING.md. Resolves #188 * run nightly golden tests sequentially. (#217) Should reduce resource requirements and potential CPU contention but increases overall execution time. * tests: added new setup configuration + test args (#211) - added utils for future tests and conftest - added test args * new docs build * golden test update	2019-02-20 23:52:34 +02:00
Cody Hsieh	bf0a65eefd	remove redundant observe ignore (#168 )	2019-01-17 14:08:05 -08:00
Zach Dwiel	fedb4cbd7c	Cleanup and refactoring (#171 )	2019-01-15 10:04:53 +02:00
Gal Leibovich	4c914c057c	fix for finding the right filter checkpoint to restore + do not update internal filter state when evaluating + fix SharedRunningStats checkpoint filenames (#147 )	2018-12-17 21:36:27 +02:00
Gal Leibovich	f9ee526536	Fix for issue #128 - circular DQN import (#130 )	2018-12-16 16:06:44 +02:00
Gal Leibovich	f12857a8c7	Docs changes - fixing blogpost links, removing importing all exploration policies (#139 ) * updated docs * removing imports for all exploration policies in __init__ + setting the right blog-post link * small cleanups	2018-12-05 16:16:16 -05:00
Ryan Peach	3c58ed740b	'CompositeAgent' object has no attribute 'handle_episode_ended' (#136 )	2018-12-05 11:28:16 +02:00
Gal Leibovich	a1c56edd98	Fixes for having NumpySharedRunningStats syncing on multi-node (#139 ) 1. Having the standard checkpoint prefix in order for the data store to grab it, and sync it to S3. 2. Removing the reference to Redis so that it won't try to pickle that in. 3. Enable restoring a checkpoint into a single-worker run, which was saved by a single-node-multiple-worker run.	2018-11-23 16:11:47 +02:00
Sina Afrooze	87a7848b0a	Moved tf.variable_scope and tf.device calls to framework-specific architecture (#136 )	2018-11-22 22:52:21 +02:00
Gal Leibovich	a112ee69f6	Save filters' internal state (#127 ) * save filters internal state * moving the restore to be made from within NumpyRunningStats	2018-11-20 17:21:48 +02:00
Sina Afrooze	67eb9e4c28	Adding checkpointing framework (#74 ) * Adding checkpointing framework as well as mxnet checkpointing implementation. - MXNet checkpoint for each network is saved in a separate file. * Adding checkpoint restore for mxnet to graph-manager * Add unit-test for get_checkpoint_state() * Added match.group() to fix unit-test failing on CI * Added ONNX export support for MXNet	2018-11-19 19:45:49 +02:00
Gal Leibovich	430e286c56	muting pygame's hello message (#116 )	2018-11-18 18:02:55 +02:00
Gal Leibovich	6caf721d1c	Numpy shared running stats (#97 )	2018-11-18 14:46:40 +02:00
Thom Lane	a0f25034c3	Added average total reward to logging after evaluation phase completes. (#93 )	2018-11-16 08:22:00 -08:00
Ajay Deshpande	fde73ced13	Simulating the act on the trainer. (#65 ) * Remove the use of daemon threads for Redis subscribe. * Emulate act and observe on trainer side to update internal vars.	2018-11-15 08:38:58 -08:00
Itai Caspi	6d40ad1650	update of api docstrings across coach and tutorials [WIP] (#91 ) * updating the documentation website * adding the built docs * update of api docstrings across coach and tutorials 0-2 * added some missing api documentation * New Sphinx based documentation	2018-11-15 15:00:13 +02:00
Scott Leishman	524f8436a2	create per environment Dockerfiles. (#70 ) * create per environment Dockerfiles. Adjust CI setup to better parallelize runs. Fix a couple of issues in golden and trace tests. Update a few of the docs. * bugfix in mmc agent. Also install kubectl for CI, update badge branch. * remove integration test parallelism.	2018-11-14 07:40:22 -08:00
Balaji Subramaniam	a849c17e46	Enable distributed SharedRunningStats (#81 ) - Use Redis pub/sub for updating SharedRunningStats.	2018-11-13 19:17:38 +02:00
Ajay Deshpande	875d6ef017	Adding target reward and target sucess (#58 ) * Adding target reward * Adding target successs * Addressing comments * Using custom_reward_threshold and target_success_rate * Adding exit message * Moving success rate to environment * Making target_success_rate optional	2018-11-12 15:03:43 -08:00
Gal Leibovich	49dea39d34	N-step returns for rainbow (#67 ) * n_step returns for rainbow * Rename CartPole_PPO -> CartPole_ClippedPPO	2018-11-07 18:33:08 +02:00
Sina Afrooze	a888226641	Move embedder, middleware, and head parameters to framework agnostic modules. (#45 ) Part of #28	2018-10-29 14:46:40 -07:00
Ajay Deshpande	9a30c26469	Adding improvements	2018-10-23 19:59:02 -04:00
Zach Dwiel	9804b033a2	rename save_checkpoint_dir -> checkpoint_save_dir	2018-10-23 17:10:58 -04:00
Ajay Deshpande	b285a02023	Adding parameteres, checking transitions before training	2018-10-23 16:55:37 -04:00
Ajay Deshpande	7f00235ed5	waiting for a new checkpoint if it's available	2018-10-23 16:54:43 -04:00
Ajay Deshpande	a7f5442015	Adding should_train helper and should_train in graph_manager	2018-10-23 16:54:43 -04:00
Ajay Deshpande	6b2de6ba6d	Adding initial interface for backend and redis pubsub (#19 ) * Adding initial interface for backend and redis pubsub * Addressing comments, adding super in all memories * Removing distributed experience replay	2018-10-23 16:51:48 -04:00
Ajay Deshpande	ce9838a7d6	Adding kubernetes orchestrator for rollouts, adding requirements for incremental docker builds	2018-10-23 16:46:04 -04:00
Shadi Endrawis	51726a5b80	network_imporvements branch merge	2018-10-02 13:43:36 +03:00
Gal Leibovich	72ea933384	bug-fix for clipped_ppo not logging several signals + small cleanup	2018-10-02 14:22:37 +03:00
itaicaspi-intel	73cc6e39d0	bug fix for clipped ppo for discrete controls	2018-09-18 10:40:53 +03:00
itaicaspi-intel	e8a2b679d1	using the CoRL2017 experiment suite for CARLA_CIL	2018-09-13 16:59:22 +03:00
itaicaspi-intel	d3f97cd93b	initial CIL implementation (WIP)	2018-09-13 15:29:29 +03:00
itaicaspi-intel	171fe97a3a	imitation related bug fixes	2018-09-12 15:26:16 +03:00
itaicaspi-intel	a9bd1047c4	load and save function for non-episodic replay buffers + carla improvements + network bug fixes	2018-09-12 15:26:16 +03:00
Itai Caspi	72a1d9d426	Itaicaspi/episode reset refactoring (#105 ) * reordering of the episode reset operation and allowing to store episodes only when they are terminated * reordering of the episode reset operation and allowing to store episodes only when they are terminated * revert tensorflow-gpu to 1.9.0 + bug fix in should_train() * tests readme file and refactoring of policy optimization agent train function * Update README.md * Update README.md * additional policy optimization train function simplifications * Updated the traces after the reordering of the environment reset * docker and jenkins files * updated the traces to the ones from within the docker container * updated traces and added control suite to the docker * updated jenkins file with the intel proxy + updated doom basic a3c test params * updated line breaks in jenkins file * added a missing line break in jenkins file * refining trace tests ignored presets + adding a configurable beta entropy value * switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue * updated benchmarks for dueling ddqn breakout and pong * allowing dynamic updates to the loss weights + bug fix in episode.update_returns * remove docker and jenkins file	2018-09-04 15:07:54 +03:00
Gal Leibovich	d862a3be83	rainbow dqn hyper-parameter updates	2018-08-30 20:41:38 +03:00
Gal Leibovich	ea294de7fd	adding dueling support for rainbow dqn (now only missing n-step)	2018-08-30 18:15:59 +03:00
Gal Leibovich	bbe7ac3338	Rainbow DQN agent (WIP - still missing dueling and n-step) + adding support for Prioritized ER for C51	2018-08-30 18:14:53 +03:00
Gal Leibovich	1aa2ab0590	parameter noise exploration - using Noisy Nets	2018-08-27 18:19:01 +03:00
itaicaspi-intel	658b437079	removing datasets + imports optimization	2018-08-27 10:54:11 +03:00
Shadi Endrawis	3abb6cd415	Trace tests update	2018-08-20 13:01:30 +03:00
Gal Novik	19ca5c24b1	pre-release 0.10.0	2018-08-13 17:11:34 +03:00

49 Commits