* Add Robosuite parameters for all env types + initialize env flow
* Init flow done
* Rest of Environment API complete for RobosuiteEnvironment
* RobosuiteEnvironment changes
* Observation stacking filter
* Add proper frame_skip in addition to control_freq
* Hardcode Coach rendering to 'frontview' camera
* Robosuite_Lift_DDPG preset + Robosuite env updates
* Move observation stacking filter from env to preset
* Pre-process observation - concatenate depth map (if exists)
to image and object state (if exists) to robot state
* Preset parameters based on Surreal DDPG parameters, taken from:
https://github.com/SurrealAI/surreal/blob/master/surreal/main/ddpg_configs.py
* RobosuiteEnvironment fixes - working now with PyGame rendering
* Preset minor modifications
* ObservationStackingFilter - option to concat non-vector observations
* Consider frame skip when setting horizon in robosuite env
* Robosuite lift preset - update heatup length and training interval
* Robosuite env - change control_freq to 10 to match Surreal usage
* Robosuite clipped PPO preset
* Distribute multiple workers (-n #) over multiple GPUs
* Clipped PPO memory optimization from @shadiendrawis
* Fixes to evaluation only workers
* RoboSuite_ClippedPPO: Update training interval
* Undo last commit (update training interval)
* Fix "doube-negative" if conditions
* multi-agent single-trainer clipped ppo training with cartpole
* cleanups (not done yet) + ~tuned hyper-params for mast
* Switch to Robosuite v1 APIs
* Change presets to IK controller
* more cleanups + enabling evaluation worker + better logging
* RoboSuite_Lift_ClippedPPO updates
* Fix major bug in obs normalization filter setup
* Reduce coupling between Robosuite API and Coach environment
* Now only non task-specific parameters are explicitly defined
in Coach
* Removed a bunch of enums of Robosuite elements, using simple
strings instead
* With this change new environments/robots/controllers in Robosuite
can be used immediately in Coach
* MAST: better logging of actor-trainer interaction + bug fixes + performance improvements.
Still missing: fixed pubsub for obs normalization running stats + logging for trainer signals
* lstm support for ppo
* setting JOINT VELOCITY action space by default + fix for EveryNEpisodes video dump filter + new TaskIDDumpFilter + allowing or between video dump filters
* Separate Robosuite clipped PPO preset for the non-MAST case
* Add flatten layer to architectures and use it in Robosuite presets
This is required for embedders that mix conv and dense
TODO: Add MXNet implementation
* publishing running_stats together with the published policy + hyper-param for when to publish a policy + cleanups
* bug-fix for memory leak in MAST
* Bugfix: Return value in TF BatchnormActivationDropout.to_tf_instance
* Explicit activations in embedder scheme so there's no ReLU after flatten
* Add clipped PPO heads with configurable dense layers at the beginning
* This is a workaround needed to mimic Surreal-PPO, where the CNN and
LSTM are shared between actor and critic but the FC layers are not
shared
* Added a "SchemeBuilder" class, currently only used for the new heads
but we can change Middleware and Embedder implementations to use it
as well
* Video dump setting fix in basic preset
* logging screen output to file
* coach to start the redis-server for a MAST run
* trainer drops off-policy data + old policy in ClippedPPO updates only after policy was published + logging free memory stats + actors check for a new policy only at the beginning of a new episode + fixed a bug where the trainer was logging "Training Reward = 0", causing dashboard to incorrectly display the signal
* Add missing set_internal_state function in TFSharedRunningStats
* Robosuite preset - use SingleLevelSelect instead of hard-coded level
* policy ID published directly on Redis
* Small fix when writing to log file
* Major bugfix in Robosuite presets - pass dense sizes to heads
* RoboSuite_Lift_ClippedPPO hyper-params update
* add horizon and value bootstrap to GAE calculation, fix A3C with LSTM
* adam hyper-params from mujoco
* updated MAST preset with IK_POSE_POS controller
* configurable initialization for policy stdev + custom extra noise per actor + logging of policy stdev to dashboard
* values loss weighting of 0.5
* minor fixes + presets
* bug-fix for MAST where the old policy in the trainer had kept updating every training iter while it should only update after every policy publish
* bug-fix: reset_internal_state was not called by the trainer
* bug-fixes in the lstm flow + some hyper-param adjustments for CartPole_ClippedPPO_LSTM -> training and sometimes reaches 200
* adding back the horizon hyper-param - a messy commit
* another bug-fix missing from prev commit
* set control_freq=2 to match action_scale 0.125
* ClippedPPO with MAST cleanups and some preps for TD3 with MAST
* TD3 presets. RoboSuite_Lift_TD3 seems to work well with multi-process runs (-n 8)
* setting termination on collision to be on by default
* bug-fix following prev-prev commit
* initial cube exploration environment with TD3 commit
* bug fix + minor refactoring
* several parameter changes and RND debugging
* Robosuite Gym wrapper + Rename TD3_Random* -> Random*
* algorithm update
* Add RoboSuite v1 env + presets (to eventually replace non-v1 ones)
* Remove grasping presets, keep only V1 exp. presets (w/o V1 tag)
* Keep just robosuite V1 env as the 'robosuite_environment' module
* Exclude Robosuite and MAST presets from integration tests
* Exclude LSTM and MAST presets from golden tests
* Fix mistakenly removed import
* Revert debug changes in ReaderWriterLock
* Try another way to exclude LSTM/MAST golden tests
* Remove debug prints
* Remove PreDense heads, unused in the end
* Missed removing an instance of PreDense head
* Remove MAST, not required for this PR
* Undo unused concat option in ObservationStackingFilter
* Remove LSTM updates, not required in this PR
* Update README.md
* code changes for the exploration flow to work with robosuite master branch
* code cleanup + documentation
* jupyter tutorial for the goal-based exploration + scatter plot
* typo fix
* Update README.md
* seprate parameter for the obs-goal observation + small fixes
* code clarity fixes
* adjustment in tutorial 5
* Update tutorial
* Update tutorial
Co-authored-by: Guy Jacob <guy.jacob@intel.com>
Co-authored-by: Gal Leibovich <gal.leibovich@intel.com>
Co-authored-by: shadi.endrawis <sendrawi@aipg-ra-skx-03.ra.intel.com>
* updating the documentation website
* adding the built docs
* update of api docstrings across coach and tutorials 0-2
* added some missing api documentation
* New Sphinx based documentation
NOTE: tensorflow framework works fine if mxnet is not installed in env, but mxnet will not work if tensorflow is not installed because of the code in network_wrapper.