* Add Robosuite parameters for all env types + initialize env flow
* Init flow done
* Rest of Environment API complete for RobosuiteEnvironment
* RobosuiteEnvironment changes
* Observation stacking filter
* Add proper frame_skip in addition to control_freq
* Hardcode Coach rendering to 'frontview' camera
* Robosuite_Lift_DDPG preset + Robosuite env updates
* Move observation stacking filter from env to preset
* Pre-process observation - concatenate depth map (if exists)
to image and object state (if exists) to robot state
* Preset parameters based on Surreal DDPG parameters, taken from:
https://github.com/SurrealAI/surreal/blob/master/surreal/main/ddpg_configs.py
* RobosuiteEnvironment fixes - working now with PyGame rendering
* Preset minor modifications
* ObservationStackingFilter - option to concat non-vector observations
* Consider frame skip when setting horizon in robosuite env
* Robosuite lift preset - update heatup length and training interval
* Robosuite env - change control_freq to 10 to match Surreal usage
* Robosuite clipped PPO preset
* Distribute multiple workers (-n #) over multiple GPUs
* Clipped PPO memory optimization from @shadiendrawis
* Fixes to evaluation only workers
* RoboSuite_ClippedPPO: Update training interval
* Undo last commit (update training interval)
* Fix "doube-negative" if conditions
* multi-agent single-trainer clipped ppo training with cartpole
* cleanups (not done yet) + ~tuned hyper-params for mast
* Switch to Robosuite v1 APIs
* Change presets to IK controller
* more cleanups + enabling evaluation worker + better logging
* RoboSuite_Lift_ClippedPPO updates
* Fix major bug in obs normalization filter setup
* Reduce coupling between Robosuite API and Coach environment
* Now only non task-specific parameters are explicitly defined
in Coach
* Removed a bunch of enums of Robosuite elements, using simple
strings instead
* With this change new environments/robots/controllers in Robosuite
can be used immediately in Coach
* MAST: better logging of actor-trainer interaction + bug fixes + performance improvements.
Still missing: fixed pubsub for obs normalization running stats + logging for trainer signals
* lstm support for ppo
* setting JOINT VELOCITY action space by default + fix for EveryNEpisodes video dump filter + new TaskIDDumpFilter + allowing or between video dump filters
* Separate Robosuite clipped PPO preset for the non-MAST case
* Add flatten layer to architectures and use it in Robosuite presets
This is required for embedders that mix conv and dense
TODO: Add MXNet implementation
* publishing running_stats together with the published policy + hyper-param for when to publish a policy + cleanups
* bug-fix for memory leak in MAST
* Bugfix: Return value in TF BatchnormActivationDropout.to_tf_instance
* Explicit activations in embedder scheme so there's no ReLU after flatten
* Add clipped PPO heads with configurable dense layers at the beginning
* This is a workaround needed to mimic Surreal-PPO, where the CNN and
LSTM are shared between actor and critic but the FC layers are not
shared
* Added a "SchemeBuilder" class, currently only used for the new heads
but we can change Middleware and Embedder implementations to use it
as well
* Video dump setting fix in basic preset
* logging screen output to file
* coach to start the redis-server for a MAST run
* trainer drops off-policy data + old policy in ClippedPPO updates only after policy was published + logging free memory stats + actors check for a new policy only at the beginning of a new episode + fixed a bug where the trainer was logging "Training Reward = 0", causing dashboard to incorrectly display the signal
* Add missing set_internal_state function in TFSharedRunningStats
* Robosuite preset - use SingleLevelSelect instead of hard-coded level
* policy ID published directly on Redis
* Small fix when writing to log file
* Major bugfix in Robosuite presets - pass dense sizes to heads
* RoboSuite_Lift_ClippedPPO hyper-params update
* add horizon and value bootstrap to GAE calculation, fix A3C with LSTM
* adam hyper-params from mujoco
* updated MAST preset with IK_POSE_POS controller
* configurable initialization for policy stdev + custom extra noise per actor + logging of policy stdev to dashboard
* values loss weighting of 0.5
* minor fixes + presets
* bug-fix for MAST where the old policy in the trainer had kept updating every training iter while it should only update after every policy publish
* bug-fix: reset_internal_state was not called by the trainer
* bug-fixes in the lstm flow + some hyper-param adjustments for CartPole_ClippedPPO_LSTM -> training and sometimes reaches 200
* adding back the horizon hyper-param - a messy commit
* another bug-fix missing from prev commit
* set control_freq=2 to match action_scale 0.125
* ClippedPPO with MAST cleanups and some preps for TD3 with MAST
* TD3 presets. RoboSuite_Lift_TD3 seems to work well with multi-process runs (-n 8)
* setting termination on collision to be on by default
* bug-fix following prev-prev commit
* initial cube exploration environment with TD3 commit
* bug fix + minor refactoring
* several parameter changes and RND debugging
* Robosuite Gym wrapper + Rename TD3_Random* -> Random*
* algorithm update
* Add RoboSuite v1 env + presets (to eventually replace non-v1 ones)
* Remove grasping presets, keep only V1 exp. presets (w/o V1 tag)
* Keep just robosuite V1 env as the 'robosuite_environment' module
* Exclude Robosuite and MAST presets from integration tests
* Exclude LSTM and MAST presets from golden tests
* Fix mistakenly removed import
* Revert debug changes in ReaderWriterLock
* Try another way to exclude LSTM/MAST golden tests
* Remove debug prints
* Remove PreDense heads, unused in the end
* Missed removing an instance of PreDense head
* Remove MAST, not required for this PR
* Undo unused concat option in ObservationStackingFilter
* Remove LSTM updates, not required in this PR
* Update README.md
* code changes for the exploration flow to work with robosuite master branch
* code cleanup + documentation
* jupyter tutorial for the goal-based exploration + scatter plot
* typo fix
* Update README.md
* seprate parameter for the obs-goal observation + small fixes
* code clarity fixes
* adjustment in tutorial 5
* Update tutorial
* Update tutorial
Co-authored-by: Guy Jacob <guy.jacob@intel.com>
Co-authored-by: Gal Leibovich <gal.leibovich@intel.com>
Co-authored-by: shadi.endrawis <sendrawi@aipg-ra-skx-03.ra.intel.com>
* Change build_*_env jobs to pull base image of current "tag"
instead of "master" image
* Change nightly flow so build_*_env jobs now gated by build_base (so
change in previous bullet works in nightly)
* Bugfix in CheckpointDataStore: Call to object.__init__ with
parameters
* Disabling unstable Doom A3C and ACER golden tests
* Currently this is specific to the case of discretizing a continuous action space. Can easily be adapted to other case by feeding the kNN otherwise, and removing the usage of a discretizing output action filter
* GraphManager.set_session also sets self.sess
* make sure that GraphManager.fetch_from_worker uses training phase
* remove unnecessary phase setting in training worker
* reorganize rollout worker
* provide default name to GlobalVariableSaver.__init__ since it isn't really used anyway
* allow dividing TrainingSteps and EnvironmentSteps
* add timestamps to the log
* added redis data store
* conflict merge fix
* SAC algorithm
* SAC - updates to agent (learn_from_batch), sac_head and sac_q_head to fix problem in gradient calculation. Now SAC agents is able to train.
gym_environment - fixing an error in access to gym.spaces
* Soft Actor Critic - code cleanup
* code cleanup
* V-head initialization fix
* SAC benchmarks
* SAC Documentation
* typo fix
* documentation fixes
* documentation and version update
* README typo
* introduce dockerfiles.
* ensure golden tests are run not just collected.
* Skip CI download of dockerfiles.
* add StarCraft environment and tests.
* add minimaps starcraft validation parameters.
* Add functional test running (from Ayoob)
* pin mujoco_py version to a 1.5 compatible release.
* fix config syntax issue.
* pin remaining mujoco_py install calls.
* Relax pin of gym version in gym Dockerfile.
* update makefile based on functional test filtering.
* integration test changes to override heatup to 1000 steps + run each preset for 30 sec (to make sure we reach the train part)
* fixes to failing presets uncovered with this change + changes in the golden testing to properly test BatchRL
* fix for rainbow dqn
* fix to gym_environment (due to a change in Gym 0.12.1) + fix for rainbow DQN + some bug-fix in utils.squeeze_list
* fix for NEC agent
allowing for the last training batch drawn to be smaller than batch_size + adding support for more agents in BatchRL by adding softmax with temperature to the corresponding heads + adding a CartPole_QR_DQN preset with a golden test + cleanups
* initial ACER commit
* Code cleanup + several fixes
* Q-retrace bug fix + small clean-ups
* added documentation for acer
* ACER benchmarks
* update benchmarks table
* Add nightly running of golden and trace tests. (#202)
Resolves#200
* comment out nightly trace tests until values reset.
* remove redundant observe ignore (#168)
* ensure nightly test env containers exist. (#205)
Also bump integration test timeout
* wxPython removal (#207)
Replacing wxPython with Python's Tkinter.
Also removing the option to choose multiple files as it is unused and causes errors, and fixing the load file/directory spinner.
* Create CONTRIBUTING.md (#210)
* Create CONTRIBUTING.md. Resolves#188
* run nightly golden tests sequentially. (#217)
Should reduce resource requirements and potential CPU contention but increases
overall execution time.
* tests: added new setup configuration + test args (#211)
- added utils for future tests and conftest
- added test args
* new docs build
* golden test update
Adding mxnet components to rl_coach architectures.
- Supports PPO and DQN
- Tested with CartPole_PPO and CarPole_DQN
- Normalizing filters don't work right now (see #49) and are disabled in CartPole_PPO preset
- Checkpointing is disabled for MXNet
* Integrate coach.py params with distributed Coach.
* Minor improvements
- Use enums instead of constants.
- Reduce code duplication.
- Ask experiment name with timeout.
* reordering of the episode reset operation and allowing to store episodes only when they are terminated
* reordering of the episode reset operation and allowing to store episodes only when they are terminated
* revert tensorflow-gpu to 1.9.0 + bug fix in should_train()
* tests readme file and refactoring of policy optimization agent train function
* Update README.md
* Update README.md
* additional policy optimization train function simplifications
* Updated the traces after the reordering of the environment reset
* docker and jenkins files
* updated the traces to the ones from within the docker container
* updated traces and added control suite to the docker
* updated jenkins file with the intel proxy + updated doom basic a3c test params
* updated line breaks in jenkins file
* added a missing line break in jenkins file
* refining trace tests ignored presets + adding a configurable beta entropy value
* switch the order of trace and golden tests in jenkins + fix golden tests processes not killed issue
* updated benchmarks for dueling ddqn breakout and pong
* allowing dynamic updates to the loss weights + bug fix in episode.update_returns
* remove docker and jenkins file