coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-07-06 17:26:31 +02:00

Author	SHA1	Message	Date
Guy Jacob	0633c32805	Disable nightly tests	2021-06-03 09:07:39 +03:00
shadiendrawis	0896f43097	Robosuite exploration (#478 ) * Add Robosuite parameters for all env types + initialize env flow * Init flow done * Rest of Environment API complete for RobosuiteEnvironment * RobosuiteEnvironment changes * Observation stacking filter * Add proper frame_skip in addition to control_freq * Hardcode Coach rendering to 'frontview' camera * Robosuite_Lift_DDPG preset + Robosuite env updates * Move observation stacking filter from env to preset * Pre-process observation - concatenate depth map (if exists) to image and object state (if exists) to robot state * Preset parameters based on Surreal DDPG parameters, taken from: https://github.com/SurrealAI/surreal/blob/master/surreal/main/ddpg_configs.py * RobosuiteEnvironment fixes - working now with PyGame rendering * Preset minor modifications * ObservationStackingFilter - option to concat non-vector observations * Consider frame skip when setting horizon in robosuite env * Robosuite lift preset - update heatup length and training interval * Robosuite env - change control_freq to 10 to match Surreal usage * Robosuite clipped PPO preset * Distribute multiple workers (-n #) over multiple GPUs * Clipped PPO memory optimization from @shadiendrawis * Fixes to evaluation only workers * RoboSuite_ClippedPPO: Update training interval * Undo last commit (update training interval) * Fix "doube-negative" if conditions * multi-agent single-trainer clipped ppo training with cartpole * cleanups (not done yet) + ~tuned hyper-params for mast * Switch to Robosuite v1 APIs * Change presets to IK controller * more cleanups + enabling evaluation worker + better logging * RoboSuite_Lift_ClippedPPO updates * Fix major bug in obs normalization filter setup * Reduce coupling between Robosuite API and Coach environment * Now only non task-specific parameters are explicitly defined in Coach * Removed a bunch of enums of Robosuite elements, using simple strings instead * With this change new environments/robots/controllers in Robosuite can be used immediately in Coach * MAST: better logging of actor-trainer interaction + bug fixes + performance improvements. Still missing: fixed pubsub for obs normalization running stats + logging for trainer signals * lstm support for ppo * setting JOINT VELOCITY action space by default + fix for EveryNEpisodes video dump filter + new TaskIDDumpFilter + allowing or between video dump filters * Separate Robosuite clipped PPO preset for the non-MAST case * Add flatten layer to architectures and use it in Robosuite presets This is required for embedders that mix conv and dense TODO: Add MXNet implementation * publishing running_stats together with the published policy + hyper-param for when to publish a policy + cleanups * bug-fix for memory leak in MAST * Bugfix: Return value in TF BatchnormActivationDropout.to_tf_instance * Explicit activations in embedder scheme so there's no ReLU after flatten * Add clipped PPO heads with configurable dense layers at the beginning * This is a workaround needed to mimic Surreal-PPO, where the CNN and LSTM are shared between actor and critic but the FC layers are not shared * Added a "SchemeBuilder" class, currently only used for the new heads but we can change Middleware and Embedder implementations to use it as well * Video dump setting fix in basic preset * logging screen output to file * coach to start the redis-server for a MAST run * trainer drops off-policy data + old policy in ClippedPPO updates only after policy was published + logging free memory stats + actors check for a new policy only at the beginning of a new episode + fixed a bug where the trainer was logging "Training Reward = 0", causing dashboard to incorrectly display the signal * Add missing set_internal_state function in TFSharedRunningStats * Robosuite preset - use SingleLevelSelect instead of hard-coded level * policy ID published directly on Redis * Small fix when writing to log file * Major bugfix in Robosuite presets - pass dense sizes to heads * RoboSuite_Lift_ClippedPPO hyper-params update * add horizon and value bootstrap to GAE calculation, fix A3C with LSTM * adam hyper-params from mujoco * updated MAST preset with IK_POSE_POS controller * configurable initialization for policy stdev + custom extra noise per actor + logging of policy stdev to dashboard * values loss weighting of 0.5 * minor fixes + presets * bug-fix for MAST where the old policy in the trainer had kept updating every training iter while it should only update after every policy publish * bug-fix: reset_internal_state was not called by the trainer * bug-fixes in the lstm flow + some hyper-param adjustments for CartPole_ClippedPPO_LSTM -> training and sometimes reaches 200 * adding back the horizon hyper-param - a messy commit * another bug-fix missing from prev commit * set control_freq=2 to match action_scale 0.125 * ClippedPPO with MAST cleanups and some preps for TD3 with MAST * TD3 presets. RoboSuite_Lift_TD3 seems to work well with multi-process runs (-n 8) * setting termination on collision to be on by default * bug-fix following prev-prev commit * initial cube exploration environment with TD3 commit * bug fix + minor refactoring * several parameter changes and RND debugging * Robosuite Gym wrapper + Rename TD3_Random* -> Random* * algorithm update * Add RoboSuite v1 env + presets (to eventually replace non-v1 ones) * Remove grasping presets, keep only V1 exp. presets (w/o V1 tag) * Keep just robosuite V1 env as the 'robosuite_environment' module * Exclude Robosuite and MAST presets from integration tests * Exclude LSTM and MAST presets from golden tests * Fix mistakenly removed import * Revert debug changes in ReaderWriterLock * Try another way to exclude LSTM/MAST golden tests * Remove debug prints * Remove PreDense heads, unused in the end * Missed removing an instance of PreDense head * Remove MAST, not required for this PR * Undo unused concat option in ObservationStackingFilter * Remove LSTM updates, not required in this PR * Update README.md * code changes for the exploration flow to work with robosuite master branch * code cleanup + documentation * jupyter tutorial for the goal-based exploration + scatter plot * typo fix * Update README.md * seprate parameter for the obs-goal observation + small fixes * code clarity fixes * adjustment in tutorial 5 * Update tutorial * Update tutorial Co-authored-by: Guy Jacob <guy.jacob@intel.com> Co-authored-by: Gal Leibovich <gal.leibovich@intel.com> Co-authored-by: shadi.endrawis <sendrawi@aipg-ra-skx-03.ra.intel.com>	2021-06-01 00:34:19 +03:00
Guy Jacob	235a259223	Add Flatten layer to architectures + make flatten optional in embedders (#483 ) Flatten layer required for embedders that mix conv and dense (Cherry picking from #478)	2021-05-12 11:11:10 +03:00
Guy Jacob	c369984c2e	Update setuptools version in Dockerfile.base (#482 ) Solves vizdoom installation failure	2021-05-09 09:34:33 +03:00
Guy Jacob	ba20396f63	Update Pillow version (#481 )	2021-05-09 09:29:48 +03:00
Guy Jacob	a1a2e67fbd	logging screen output to file (#479 ) Co-authored-by: Gal Leibovich <gal.leibovich@intel.com>	2021-05-06 18:02:27 +03:00
Guy Jacob	9106b69227	Add is_on_policy property to agents (#480 )	2021-05-06 18:02:02 +03:00
Guy Jacob	06bacd9de0	Fix Rust compiler build error (Kubernetes dependency) (#471 ) Update pip version during CircleCI setup stage to resolve Rust compiler build error (as suggested in https://cryptography.io/en/latest/faq.html#installing-cryptography-fails-with-error-can-not-find-rust-compiler)	2021-02-09 15:54:44 +02:00
Guy Jacob	f52ff1784d	Fix breaking change from minio update (#469 ) `ResponseError` replaced by `S3Error` in new minio version	2020-12-15 10:02:16 +02:00
Gal Novik	59e08034c6	Update README.md	2020-11-09 10:25:05 +02:00
Gal Novik	57e809c094	Docs updates following github repo change	2020-11-08 11:54:38 +02:00
Guy Jacob	bc65f1f5fb	Pin Vizdoom version - one more location (#468 )	2020-11-04 11:37:35 +02:00
Gal Novik	4318fea436	Update requirements.txt (#466 )	2020-11-04 09:44:30 +02:00
Guy Jacob	fd765e7e38	Pin Vizdoom version (#467 )	2020-11-03 21:28:25 +02:00
Guy Jacob	103d4477eb	Disable NumPy and TF2 related warnings (#463 )	2020-09-24 15:11:45 +03:00
Gal Novik	c9738280fd	Require Python 3.6 + Changes to CI configuration (#452 ) * Change build__env jobs to pull base image of current "tag" instead of "master" image Change nightly flow so build__env jobs now gated by build_base (so change in previous bullet works in nightly) Bugfix in CheckpointDataStore: Call to object.__init__ with parameters * Disabling unstable Doom A3C and ACER golden tests	2020-07-26 16:11:22 +03:00
Guy Jacob	a6689b6036	Update cluster name in .circleci/config.yml (now all locations)	2020-06-24 16:18:49 +03:00
Guy Jacob	6658bfa429	Update cluster name in .circleci/config.yml	2020-06-24 15:24:41 +03:00
Gal Novik	f3ce685cb1	Upgrading Pillow version due to security vulnerability (#444 )	2020-04-22 20:52:24 +03:00
Gal Novik	79b05a8105	Wolpertinger preset failure fix (#434 ) Numpy 1.18 fails to cast float to int as part of the wolpertinger preset run	2020-01-14 16:26:38 +02:00
Dan Elbaz	525a22cb5b	Roll-back bokeh to version 1.0.4 (#431 ) Roll back bokeh to version 1.0.4	2019-12-23 09:33:53 +02:00
Brian Broll	0867d8d0fb	Fixed typo: Nerual -> Neural (#425 )	2019-11-16 21:13:24 +02:00
shadiendrawis	188b86369a	fix e-greedy in case action values were equal (#423 )	2019-11-10 17:20:44 +02:00
shadiendrawis	6ca91b9090	add reset internal state to rollout worker (#421 )	2019-11-03 14:42:51 +02:00
Gal Leibovich	e288a552dd	Update requirements.txt (#422 )	2019-10-28 18:30:48 +02:00
Gal Leibovich	66fada7f78	Remove assertion from BatchRLGraphManager	2019-10-22 11:54:14 +03:00
shadiendrawis	6db695ad8a	freeze tensorflow version to <= 1.14.0 (#416 )	2019-10-10 17:47:25 +03:00
shadiendrawis	5ad5a58350	fix atari stack overflow (#412 )	2019-10-06 18:14:21 +03:00
shadiendrawis	0a712ecc94	Fix numpy shared running stats to support images (#411 )	2019-10-06 12:16:38 +03:00
Gal Leibovich	79a4161eca	Workaround for dumping gifs through the Python API (#405 )	2019-09-26 12:21:25 +03:00
Pi Esposito	9e82c06be3	importing heads parameters from the correct file on tutorial #1 (#403 )	2019-09-24 20:44:49 +03:00
Gal Novik	34bc292e60	Limiting intel-tensorflow version to 1.13.1 to re-enable CI; Updating nightly schedule to run on Saturdays as well	2019-09-23 12:52:00 +03:00
Gal Novik	0704260b5d	Updating EKS cluster name	2019-09-20 16:12:35 +03:00
Gal Novik	b5d66c0942	Removing CARLA docker file from README (#402 )	2019-09-16 07:17:58 +03:00
Gal Leibovich	c7949d7011	Fix Atari Schedule Heatup	2019-09-08 16:57:38 +03:00
Gal Novik	13a4a09f72	removing weekly tests (#398 )	2019-09-08 14:04:24 +03:00
Gal Leibovich	138ced23ba	RL in Large Discrete Action Spaces - Wolpertinger Agent (#394 ) * Currently this is specific to the case of discretizing a continuous action space. Can easily be adapted to other case by feeding the kNN otherwise, and removing the usage of a discretizing output action filter	2019-09-08 12:53:49 +03:00
shadiendrawis	fc50398544	typo fix (#396 )	2019-09-04 12:40:23 +03:00
Zach Dwiel	7b0fccb041	Add RedisDataStore (#295 ) * GraphManager.set_session also sets self.sess * make sure that GraphManager.fetch_from_worker uses training phase * remove unnecessary phase setting in training worker * reorganize rollout worker * provide default name to GlobalVariableSaver.__init__ since it isn't really used anyway * allow dividing TrainingSteps and EnvironmentSteps * add timestamps to the log * added redis data store * conflict merge fix	2019-08-28 21:15:58 +03:00
Scott Leishman	34e1c04f29	further CI cluster name updates. (#387 )	2019-08-06 10:18:07 +03:00
Gal Novik	92460736bc	Updated tutorial and docs (#386 ) Improved getting started tutorial, and updated docs to point to version 1.0.0	2019-08-05 16:46:15 +03:00
Gal Leibovich	c1d1fae342	Distiller's AMC induced changes (#359 ) * override episode rewards with the last transition reward * EWMA normalization filter * allowing control over when the pre_network filter runs	2019-08-05 10:24:58 +03:00
Scott Leishman	7df67dafa3	update to point at new CI cluster. (#385 )	2019-08-04 13:55:04 +03:00
Gal Novik	2697142d5a	Release 1.0.0 (#382 ) * Updating README * Shortening test cycles	2019-07-24 16:10:58 +03:00
Gal Leibovich	718597ce9a	Fixes to Batch RL tutorial (#378 )	2019-07-16 11:22:42 +03:00
Gal Novik	0a4cc7e081	Additional cmd line examples (#377 ) Adding command line examples to the Quick Start Guide tutorial	2019-07-15 12:32:59 +03:00
Gal Leibovich	19ad2d60a7	Batch RL Tutorial (#372 )	2019-07-14 18:43:48 +03:00
Gal Novik	b82414138d	Workaround the OSError due to bad address failure on the CI runs (#370 ) workaround the OSError due to bad address failure on the CI runs	2019-07-07 17:11:19 +03:00
Gal Leibovich	587b74e04a	Remove double call to reset_internal_state() on gym environments (#364 )	2019-07-02 13:43:23 +03:00
anabwan	a576ab5659	tests: Removed mxnet from functional tests + minor fix on rewards (#362 ) * ci: change workflow * changed timeout * fix function reach reward * print logs * removing mxnet * res'	2019-06-27 18:52:29 +03:00

1 2 3 4 5 ...

520 Commits