coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-03-19 00:13:46 +01:00

Author	SHA1	Message	Date
guyk1971	74db141d5e	SAC algorithm (#282 ) * SAC algorithm * SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train. gym_environment - fixing an error in access to gym.spaces * Soft Actor Critic - code cleanup * code cleanup * V-head initialization fix * SAC benchmarks * SAC Documentation * typo fix * documentation fixes * documentation and version update * README typo	2019-05-01 18:37:49 +03:00
Ajay Deshpande	33dc29ee99	Uploading checkpoint if crd provided (#191 ) * Uploading checkpoint if crd provided * Changing the calculation of total steps because of a recent change in core_types Fixes #195	2019-04-26 12:27:33 -07:00
anabwan	b3db9ce77d	tests: fixed failed tests - stabling CI (#298 ) * tests: stabling CI * tests: fix failed tests - stabling CI * fix get csv files. - fixed seed test * fix clres on conftest - now can modify paths during test run. - this fixed the mxnet checkpoint test * tests: fix comments	2019-04-23 15:12:11 +03:00
Gal Leibovich	9f625c197b	fix for fetch rendering (#297 ) * fix for fetch rendering - removing code which was once required with older gym versions. images are now rendered correctly by default with the latest gym. * fixing mujoco camera id failure	2019-04-21 17:37:14 +03:00
Gal Leibovich	4741b0b916	BCQ variant on top of DDQN (#276 ) * kNN based model for predicting which actions to drop * fix for seeds with batch rl	2019-04-16 17:06:23 +03:00
Federico Andres Lois	bdb9b224a8	Include missing RegressionHead. (#263 )	2019-04-16 15:24:06 +03:00
anabwan	20a8dea0dd	tests: minor fix for functional tests (#289 ) * tests: minor fix for functional tests * tests: fix value	2019-04-15 12:28:23 +03:00
zach dwiel	88f9c926ab	update comment describing why the output filters don't modify Agent.last_action_info	2019-04-09 12:14:27 -04:00
zach dwiel	fd2c210915	rename AgentInterface.emulate_observe_on_trainer or observe_transition and call from AgentInterface.observe	2019-04-09 12:14:27 -04:00
zach dwiel	f8741522e4	merge AgentInterface.emulate_act_on_trainer and AgentInterface.act	2019-04-09 12:14:27 -04:00
zach dwiel	f2fead57e5	change method interface: AgentInterface.emulate_act_on_trainer(transition: Transition) -> emulate_act_on_trainer(action: ActionType)	2019-04-09 12:14:27 -04:00
zach dwiel	b20e795ce0	create method LevelManager.acting_agent()	2019-04-09 12:14:27 -04:00
zach dwiel	54fdfe2da8	simplify rollout worker steps with new magic methods on StepMethod	2019-04-09 12:14:27 -04:00
zach dwiel	2cb078b4c2	add __truediv__, __rtruediv__ and __eq__ to StepMethod	2019-04-09 12:14:27 -04:00
zach dwiel	83da5cde2f	remove unnecessary parentheses	2019-04-09 12:14:27 -04:00
zach dwiel	dddaefb210	fixed bug in rollout worker where total number of improved steps are not taken	2019-04-09 12:14:27 -04:00
zach dwiel	06de3b0f07	update LevelManager type signature	2019-04-09 12:14:27 -04:00
zach dwiel	f16cd3cb1e	remove unused ActionInfo.action_intrinsic_reward	2019-04-09 12:14:27 -04:00
zach dwiel	7d79433c05	remove unused parameter scale_external_reward_by_intrinsic_reward_value	2019-04-09 12:14:27 -04:00
anabwan	881f78f45a	tests: new checkpoint mxnet test + fix utils (#273 ) * tests: new mxnet test + fix utils new test added: - test_restore_checkpoint[tensorflow, mxnet] fix failed tests in CI improve utils * tests: fix comments for mxnet checkpoint test and utils	2019-04-07 07:36:44 +03:00
Zach Dwiel	2291cee2c6	allow serializing from/to arrays/str from GlobalVariableSaver (#285 )	2019-04-04 11:09:19 -04:00
anabwan	cdb8d9e518	tests: fix multi environment variables in configci (#284 ) * tests: fix multi environment variables in configci - fix multi environment vairables in configci - removing bitflip from mujoco tests - add bitflip to gym * tests: disable mujoco_a3c_lstm + fix timeout and fix docker	2019-04-04 16:11:41 +03:00
Scott Leishman	f173e69187	introduce dockerfiles. (#169 ) * introduce dockerfiles. * ensure golden tests are run not just collected. * Skip CI download of dockerfiles. * add StarCraft environment and tests. * add minimaps starcraft validation parameters. * Add functional test running (from Ayoob) * pin mujoco_py version to a 1.5 compatible release. * fix config syntax issue. * pin remaining mujoco_py install calls. * Relax pin of gym version in gym Dockerfile. * update makefile based on functional test filtering.	2019-04-03 19:33:17 +03:00
shadiendrawis	0b808f0794	remove -ept flag (#283 )	2019-04-03 16:32:24 +03:00
anabwan	869bd421a3	tests: added new checkpoint and functional tests (#265 ) * added new tests - test_preset_n_and_ew - test_preset_n_and_ew_and_onnx * code utils improvements (all utils) * improve checkpoint_test * new functionality for functional_test markers and presets lists * removed special environment container * add xfail to certain tests	2019-03-28 13:57:31 -07:00
Gal Leibovich	310d31c227	integration test changes to reach the train part (#254 ) * integration test changes to override heatup to 1000 steps + run each preset for 30 sec (to make sure we reach the train part) * fixes to failing presets uncovered with this change + changes in the golden testing to properly test BatchRL * fix for rainbow dqn * fix to gym_environment (due to a change in Gym 0.12.1) + fix for rainbow DQN + some bug-fix in utils.squeeze_list * fix for NEC agent	2019-03-27 21:14:19 +02:00
Gal Leibovich	6e08c55ad5	Enabling-more-agents-for-Batch-RL-and-cleanup (#258 ) allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups	2019-03-21 16:10:29 +02:00
Gal Leibovich	abec59f367	fixes to rainbow dqn + a cartpole based golden test (#253 )	2019-03-21 12:57:56 +02:00
Gal Leibovich	e3c7e526c7	Batch RL (#238 )	2019-03-19 18:07:09 +02:00
anabwan	4a8451ff02	tests: added new tests + utils code improved (#221 ) * tests: added new tests + utils code improved * new tests: - test_preset_args_combination - test_preset_mxnet_framework * added more flags to test_preset_args * added validation for flags in utils * tests: added new tests + fixed utils * tests: added new checkpoint test * tests: added checkpoint test improve utils * tests: added tests + improve validations * bump integration CI run timeout. * tests: improve timerun + add functional test marker	2019-03-18 11:21:43 +02:00
Gal Leibovich	d6158a5cfc	restoring from a checkpoint file (#247 )	2019-03-17 16:28:09 +02:00
shadiendrawis	f03bd7ad93	benchmark update (#250 )	2019-03-17 15:33:28 +02:00
Gal Leibovich	c02333b1ba	fix dashboard to allow connections from a remote machine. (#231 )	2019-03-10 13:15:14 +02:00
Gal Leibovich	9a895a1ac7	bug-fix for l2_regularization not in use (#230 ) * bug-fix for l2_regularization not in use * removing not in use TF REGULARIZATION_LOSSES collection	2019-03-03 15:11:06 +02:00
Gal Novik	10220be9be	Adding support for evaluation only mode with predefined number of steps (#225 )	2019-03-03 10:03:45 +02:00
Ajay Deshpande	2c1a9dbf20	Adding framework for multinode tests (#149 ) * Currently runs CartPole_ClippedPPO and Mujoco_ClippedPPO with inverted_pendulum level.	2019-02-26 13:53:12 -08:00
shadiendrawis	2b5d1dabe6	ACER algorithm (#184 ) * initial ACER commit * Code cleanup + several fixes * Q-retrace bug fix + small clean-ups * added documentation for acer * ACER benchmarks * update benchmarks table * Add nightly running of golden and trace tests. (#202) Resolves #200 * comment out nightly trace tests until values reset. * remove redundant observe ignore (#168) * ensure nightly test env containers exist. (#205) Also bump integration test timeout * wxPython removal (#207) Replacing wxPython with Python's Tkinter. Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner. * Create CONTRIBUTING.md (#210) * Create CONTRIBUTING.md. Resolves #188 * run nightly golden tests sequentially. (#217) Should reduce resource requirements and potential CPU contention but increases overall execution time. * tests: added new setup configuration + test args (#211) - added utils for future tests and conftest - added test args * new docs build * golden test update	2019-02-20 23:52:34 +02:00
anabwan	7253f511ed	tests: added new setup configuration + test args (#211 ) - added utils for future tests and conftest - added test args	2019-02-13 07:43:59 -05:00
Gal Novik	135f02fb46	wxPython removal (#207 ) Replacing wxPython with Python's Tkinter. Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.	2019-01-23 20:49:37 +02:00
Cody Hsieh	bf0a65eefd	remove redundant observe ignore (#168 )	2019-01-17 14:08:05 -08:00
Zach Dwiel	8672f8b542	Fix golden tests (#199 ) * remove unused functions utils.read_json and utils.write_json * increase verbosity of golden tests; detect errors in golden tests	2019-01-16 17:38:11 -08:00
Zach Dwiel	fedb4cbd7c	Cleanup and refactoring (#171 )	2019-01-15 10:04:53 +02:00
Zach Dwiel	cd812b0d25	more clear names for methods of Space (#181 ) * rename Space.val_matches_space_definition -> contains; Space.is_point_in_space_shape -> valid_index * rename valid_index -> is_valid_index	2019-01-14 15:02:53 -05:00
Zach Dwiel	0ccc333d77	raise value error if there is an invalid action space (#179 )	2019-01-13 11:06:48 +02:00
Scott Leishman	053adf0ca9	prevent long job CI timeouts owing to lack of EKS token refresh (#183 ) * add additional info during exception of eks runs. * ensure we refresh k8s config after long calls. Kubernetes client on EKS has a 10 minute token time to live, so will result in unauthorized errors if tokens are not refreshed on long jobs.	2019-01-09 15:12:00 -08:00
Gourav Roy	b1e9ea48d8	Refactored GlobalVariableSaver	2019-01-03 15:08:34 -08:00
Gourav Roy	619ea0944e	Avoid Memory Leak in Rollout worker ISSUE: When we restore checkpoints, we create new nodes in the Tensorflow graph. This happens when we assign new value (op node) to RefVariable in GlobalVariableSaver. With every restore the size of TF graph increases as new nodes are created and old unused nodes are not removed from the graph. This causes the memory leak in restore_checkpoint codepath. FIX: We use TF placeholder to update the variables which avoids the memory leak.	2019-01-02 23:09:09 -08:00
Gourav Roy	c377363e50	Revert "Changes to avoid memory leak in rollout worker" This reverts commit `801aed5e10`.	2019-01-02 23:09:09 -08:00
Gourav Roy	779d3694b4	Revert "comment out the part of test in 'test_basic_rl_graph_manager_with_cartpole_dqn_and_repeated_checkpoint_restore' that run in infinite loop" This reverts commit `b8d21c73bf`.	2019-01-02 23:09:09 -08:00
Gourav Roy	6dd7ae2343	Revert "Avoid Memory Leak in Rollout worker" This reverts commit `c694766fad`.	2019-01-02 23:09:09 -08:00

1 2 3 4 5 ...

253 Commits