coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-07-07 18:06:31 +02:00

Author	SHA1	Message	Date
shadiendrawis	0896f43097	Robosuite exploration (#478 ) * Add Robosuite parameters for all env types + initialize env flow * Init flow done * Rest of Environment API complete for RobosuiteEnvironment * RobosuiteEnvironment changes * Observation stacking filter * Add proper frame_skip in addition to control_freq * Hardcode Coach rendering to 'frontview' camera * Robosuite_Lift_DDPG preset + Robosuite env updates * Move observation stacking filter from env to preset * Pre-process observation - concatenate depth map (if exists) to image and object state (if exists) to robot state * Preset parameters based on Surreal DDPG parameters, taken from: https://github.com/SurrealAI/surreal/blob/master/surreal/main/ddpg_configs.py * RobosuiteEnvironment fixes - working now with PyGame rendering * Preset minor modifications * ObservationStackingFilter - option to concat non-vector observations * Consider frame skip when setting horizon in robosuite env * Robosuite lift preset - update heatup length and training interval * Robosuite env - change control_freq to 10 to match Surreal usage * Robosuite clipped PPO preset * Distribute multiple workers (-n #) over multiple GPUs * Clipped PPO memory optimization from @shadiendrawis * Fixes to evaluation only workers * RoboSuite_ClippedPPO: Update training interval * Undo last commit (update training interval) * Fix "doube-negative" if conditions * multi-agent single-trainer clipped ppo training with cartpole * cleanups (not done yet) + ~tuned hyper-params for mast * Switch to Robosuite v1 APIs * Change presets to IK controller * more cleanups + enabling evaluation worker + better logging * RoboSuite_Lift_ClippedPPO updates * Fix major bug in obs normalization filter setup * Reduce coupling between Robosuite API and Coach environment * Now only non task-specific parameters are explicitly defined in Coach * Removed a bunch of enums of Robosuite elements, using simple strings instead * With this change new environments/robots/controllers in Robosuite can be used immediately in Coach * MAST: better logging of actor-trainer interaction + bug fixes + performance improvements. Still missing: fixed pubsub for obs normalization running stats + logging for trainer signals * lstm support for ppo * setting JOINT VELOCITY action space by default + fix for EveryNEpisodes video dump filter + new TaskIDDumpFilter + allowing or between video dump filters * Separate Robosuite clipped PPO preset for the non-MAST case * Add flatten layer to architectures and use it in Robosuite presets This is required for embedders that mix conv and dense TODO: Add MXNet implementation * publishing running_stats together with the published policy + hyper-param for when to publish a policy + cleanups * bug-fix for memory leak in MAST * Bugfix: Return value in TF BatchnormActivationDropout.to_tf_instance * Explicit activations in embedder scheme so there's no ReLU after flatten * Add clipped PPO heads with configurable dense layers at the beginning * This is a workaround needed to mimic Surreal-PPO, where the CNN and LSTM are shared between actor and critic but the FC layers are not shared * Added a "SchemeBuilder" class, currently only used for the new heads but we can change Middleware and Embedder implementations to use it as well * Video dump setting fix in basic preset * logging screen output to file * coach to start the redis-server for a MAST run * trainer drops off-policy data + old policy in ClippedPPO updates only after policy was published + logging free memory stats + actors check for a new policy only at the beginning of a new episode + fixed a bug where the trainer was logging "Training Reward = 0", causing dashboard to incorrectly display the signal * Add missing set_internal_state function in TFSharedRunningStats * Robosuite preset - use SingleLevelSelect instead of hard-coded level * policy ID published directly on Redis * Small fix when writing to log file * Major bugfix in Robosuite presets - pass dense sizes to heads * RoboSuite_Lift_ClippedPPO hyper-params update * add horizon and value bootstrap to GAE calculation, fix A3C with LSTM * adam hyper-params from mujoco * updated MAST preset with IK_POSE_POS controller * configurable initialization for policy stdev + custom extra noise per actor + logging of policy stdev to dashboard * values loss weighting of 0.5 * minor fixes + presets * bug-fix for MAST where the old policy in the trainer had kept updating every training iter while it should only update after every policy publish * bug-fix: reset_internal_state was not called by the trainer * bug-fixes in the lstm flow + some hyper-param adjustments for CartPole_ClippedPPO_LSTM -> training and sometimes reaches 200 * adding back the horizon hyper-param - a messy commit * another bug-fix missing from prev commit * set control_freq=2 to match action_scale 0.125 * ClippedPPO with MAST cleanups and some preps for TD3 with MAST * TD3 presets. RoboSuite_Lift_TD3 seems to work well with multi-process runs (-n 8) * setting termination on collision to be on by default * bug-fix following prev-prev commit * initial cube exploration environment with TD3 commit * bug fix + minor refactoring * several parameter changes and RND debugging * Robosuite Gym wrapper + Rename TD3_Random* -> Random* * algorithm update * Add RoboSuite v1 env + presets (to eventually replace non-v1 ones) * Remove grasping presets, keep only V1 exp. presets (w/o V1 tag) * Keep just robosuite V1 env as the 'robosuite_environment' module * Exclude Robosuite and MAST presets from integration tests * Exclude LSTM and MAST presets from golden tests * Fix mistakenly removed import * Revert debug changes in ReaderWriterLock * Try another way to exclude LSTM/MAST golden tests * Remove debug prints * Remove PreDense heads, unused in the end * Missed removing an instance of PreDense head * Remove MAST, not required for this PR * Undo unused concat option in ObservationStackingFilter * Remove LSTM updates, not required in this PR * Update README.md * code changes for the exploration flow to work with robosuite master branch * code cleanup + documentation * jupyter tutorial for the goal-based exploration + scatter plot * typo fix * Update README.md * seprate parameter for the obs-goal observation + small fixes * code clarity fixes * adjustment in tutorial 5 * Update tutorial * Update tutorial Co-authored-by: Guy Jacob <guy.jacob@intel.com> Co-authored-by: Gal Leibovich <gal.leibovich@intel.com> Co-authored-by: shadi.endrawis <sendrawi@aipg-ra-skx-03.ra.intel.com>	2021-06-01 00:34:19 +03:00
Guy Jacob	235a259223	Add Flatten layer to architectures + make flatten optional in embedders (#483 ) Flatten layer required for embedders that mix conv and dense (Cherry picking from #478)	2021-05-12 11:11:10 +03:00
Guy Jacob	a1a2e67fbd	logging screen output to file (#479 ) Co-authored-by: Gal Leibovich <gal.leibovich@intel.com>	2021-05-06 18:02:27 +03:00
Guy Jacob	9106b69227	Add is_on_policy property to agents (#480 )	2021-05-06 18:02:02 +03:00
Guy Jacob	f52ff1784d	Fix breaking change from minio update (#469 ) `ResponseError` replaced by `S3Error` in new minio version	2020-12-15 10:02:16 +02:00
Guy Jacob	103d4477eb	Disable NumPy and TF2 related warnings (#463 )	2020-09-24 15:11:45 +03:00
Gal Novik	c9738280fd	Require Python 3.6 + Changes to CI configuration (#452 ) * Change build__env jobs to pull base image of current "tag" instead of "master" image Change nightly flow so build__env jobs now gated by build_base (so change in previous bullet works in nightly) Bugfix in CheckpointDataStore: Call to object.__init__ with parameters * Disabling unstable Doom A3C and ACER golden tests	2020-07-26 16:11:22 +03:00
Gal Novik	79b05a8105	Wolpertinger preset failure fix (#434 ) Numpy 1.18 fails to cast float to int as part of the wolpertinger preset run	2020-01-14 16:26:38 +02:00
shadiendrawis	188b86369a	fix e-greedy in case action values were equal (#423 )	2019-11-10 17:20:44 +02:00
shadiendrawis	6ca91b9090	add reset internal state to rollout worker (#421 )	2019-11-03 14:42:51 +02:00
Gal Leibovich	66fada7f78	Remove assertion from BatchRLGraphManager	2019-10-22 11:54:14 +03:00
shadiendrawis	5ad5a58350	fix atari stack overflow (#412 )	2019-10-06 18:14:21 +03:00
shadiendrawis	0a712ecc94	Fix numpy shared running stats to support images (#411 )	2019-10-06 12:16:38 +03:00
Gal Leibovich	79a4161eca	Workaround for dumping gifs through the Python API (#405 )	2019-09-26 12:21:25 +03:00
Gal Leibovich	c7949d7011	Fix Atari Schedule Heatup	2019-09-08 16:57:38 +03:00
Gal Leibovich	138ced23ba	RL in Large Discrete Action Spaces - Wolpertinger Agent (#394 ) * Currently this is specific to the case of discretizing a continuous action space. Can easily be adapted to other case by feeding the kNN otherwise, and removing the usage of a discretizing output action filter	2019-09-08 12:53:49 +03:00
Zach Dwiel	7b0fccb041	Add RedisDataStore (#295 ) * GraphManager.set_session also sets self.sess * make sure that GraphManager.fetch_from_worker uses training phase * remove unnecessary phase setting in training worker * reorganize rollout worker * provide default name to GlobalVariableSaver.__init__ since it isn't really used anyway * allow dividing TrainingSteps and EnvironmentSteps * add timestamps to the log * added redis data store * conflict merge fix	2019-08-28 21:15:58 +03:00
Gal Leibovich	c1d1fae342	Distiller's AMC induced changes (#359 ) * override episode rewards with the last transition reward * EWMA normalization filter * allowing control over when the pre_network filter runs	2019-08-05 10:24:58 +03:00
Gal Novik	2697142d5a	Release 1.0.0 (#382 ) * Updating README * Shortening test cycles	2019-07-24 16:10:58 +03:00
Gal Leibovich	19ad2d60a7	Batch RL Tutorial (#372 )	2019-07-14 18:43:48 +03:00
Gal Novik	b82414138d	Workaround the OSError due to bad address failure on the CI runs (#370 ) workaround the OSError due to bad address failure on the CI runs	2019-07-07 17:11:19 +03:00
Gal Leibovich	587b74e04a	Remove double call to reset_internal_state() on gym environments (#364 )	2019-07-02 13:43:23 +03:00
anabwan	a576ab5659	tests: Removed mxnet from functional tests + minor fix on rewards (#362 ) * ci: change workflow * changed timeout * fix function reach reward * print logs * removing mxnet * res'	2019-06-27 18:52:29 +03:00
Gal Leibovich	d6795bd524	batchnorm fixes + disabling batchnorm in DDPG (#353 ) Co-authored-by: James Casbon <casbon+gh@gmail.com>	2019-06-23 11:28:22 +03:00
anabwan	7b5d6a3f03	tests: stabling functional tests (#355 ) * tests: stabling functional tests * functional removed	2019-06-20 15:30:47 +03:00
shadiendrawis	8e812ef82f	Coach as a library (#348 ) * CoachInterface + tutorial * Some improvements and typo fixes * merge tutorial 0 and 4 * typo fix + additional tutorial changes * tutorial changes * added reading signals and experiment path argument	2019-06-19 18:05:03 +03:00
Gal Leibovich	7eb884c5b2	TD3 (#338 )	2019-06-16 11:11:21 +03:00
Timo Kaufmann	8df3c46756	Do not hardcode path to bash (#332 )	2019-06-10 20:10:28 +03:00
Gal Leibovich	a1bb8eef89	DDPG Critic Head Bug Fix (#344 ) * A bug fix for DDPG, where the update to the policy network was based on the sum of the critic's Q predictions on the batch instead of their mean	2019-06-05 17:47:56 +03:00
anabwan	0aa5359d63	tests: added assert for cp param and changing test args order (#342 )	2019-06-05 00:16:50 +03:00
Gal Leibovich	4c996e147e	applying filters for a csv loaded dataset + some bug-fixes in data loading (#319 )	2019-05-28 15:44:55 +03:00
anabwan	f5ba14575c	tests: print logs on failure + fix -cp param (#327 ) * tests: pring logs on failure * fix import * added job to circleci * fix functional * removed debug job	2019-05-28 13:45:43 +03:00
Gal Leibovich	251dc9ccc0	Preset dependent number of csv read attempts in golden testing (#334 )	2019-05-28 12:19:57 +03:00
Gal Leibovich	9e9c4fd332	Create a dataset using an agent (#306 ) Generate a dataset using an agent (allowing to select between this and a random dataset)	2019-05-28 09:34:49 +03:00
anabwan	342b7184bc	Enabling Coach Documentation to be run even when environments are not installed (#326 )	2019-05-27 10:46:07 +03:00
James Casbon	2b7d536da4	Add head regularization costs to tf.losses (#292 )	2019-05-26 17:15:42 +03:00
anabwan	3b6e413532	tests: fix traces and changing workflow jobs (#316 ) * tests: fix traces export presets * tests: increase time for traces * tests * remove approval * fix approval * fix ap * change worflow jobs * fix path * fix repo path * change run traces * adding assert * fix assert	2019-05-26 15:27:36 +03:00
anabwan	b567091d2e	removed timestep_limit due to gym version upgrade (#325 ) * removed timestep_limit due to gym version update * removed _past_limit wrapper	2019-05-26 13:58:16 +03:00
Gal Leibovich	30c2b2fc45	moving to skimage.transform.resize (#321 )	2019-05-23 13:38:01 +03:00
Gal Leibovich	acceb03ac0	bug fixes for OPE (#311 )	2019-05-21 16:39:11 +03:00
Gal Leibovich	deb0251367	bug fix following PR #191 (#313 )	2019-05-12 13:42:45 -07:00
Gal Novik	aa9f3cefaf	Printing input size as part of network summary (#310 )	2019-05-12 15:40:02 +03:00
anabwan	ffb55b4142	tests: update traces (#302 ) * Traces folder removed from repo and moved to S3 * Traces jobs and update will use directly the S3 files	2019-05-07 10:04:05 +03:00
anabwan	740359587d	tests: fixed nightly (#301 ) * tests: fixed nightly * tests: temp testing functional tests * tests: temp testing functional tests * tests: add seed to -cp * test: last fix	2019-05-05 08:28:57 +03:00
Gal Leibovich	582921ffe3	OPE: Weighted Importance Sampling (#299 )	2019-05-02 19:25:42 +03:00
guyk1971	74db141d5e	SAC algorithm (#282 ) * SAC algorithm * SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train. gym_environment - fixing an error in access to gym.spaces * Soft Actor Critic - code cleanup * code cleanup * V-head initialization fix * SAC benchmarks * SAC Documentation * typo fix * documentation fixes * documentation and version update * README typo	2019-05-01 18:37:49 +03:00
Ajay Deshpande	33dc29ee99	Uploading checkpoint if crd provided (#191 ) * Uploading checkpoint if crd provided * Changing the calculation of total steps because of a recent change in core_types Fixes #195	2019-04-26 12:27:33 -07:00
anabwan	b3db9ce77d	tests: fixed failed tests - stabling CI (#298 ) * tests: stabling CI * tests: fix failed tests - stabling CI * fix get csv files. - fixed seed test * fix clres on conftest - now can modify paths during test run. - this fixed the mxnet checkpoint test * tests: fix comments	2019-04-23 15:12:11 +03:00
Gal Leibovich	9f625c197b	fix for fetch rendering (#297 ) * fix for fetch rendering - removing code which was once required with older gym versions. images are now rendered correctly by default with the latest gym. * fixing mujoco camera id failure	2019-04-21 17:37:14 +03:00
Gal Leibovich	4741b0b916	BCQ variant on top of DDQN (#276 ) * kNN based model for predicting which actions to drop * fix for seeds with batch rl	2019-04-16 17:06:23 +03:00

1 2 3 4 5 ...

298 Commits