1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
Files
coach/rl_coach/environments/robosuite/cube_exp.py
shadiendrawis 0896f43097 Robosuite exploration (#478)
* Add Robosuite parameters for all env types + initialize env flow

* Init flow done

* Rest of Environment API complete for RobosuiteEnvironment

* RobosuiteEnvironment changes

* Observation stacking filter
* Add proper frame_skip in addition to control_freq
* Hardcode Coach rendering to 'frontview' camera

* Robosuite_Lift_DDPG preset + Robosuite env updates

* Move observation stacking filter from env to preset
* Pre-process observation - concatenate depth map (if exists)
  to image and object state (if exists) to robot state
* Preset parameters based on Surreal DDPG parameters, taken from:
  https://github.com/SurrealAI/surreal/blob/master/surreal/main/ddpg_configs.py

* RobosuiteEnvironment fixes - working now with PyGame rendering

* Preset minor modifications

* ObservationStackingFilter - option to concat non-vector observations

* Consider frame skip when setting horizon in robosuite env

* Robosuite lift preset - update heatup length and training interval

* Robosuite env - change control_freq to 10 to match Surreal usage

* Robosuite clipped PPO preset

* Distribute multiple workers (-n #) over multiple GPUs

* Clipped PPO memory optimization from @shadiendrawis

* Fixes to evaluation only workers

* RoboSuite_ClippedPPO: Update training interval

* Undo last commit (update training interval)

* Fix "doube-negative" if conditions

* multi-agent single-trainer clipped ppo training with cartpole

* cleanups (not done yet) + ~tuned hyper-params for mast

* Switch to Robosuite v1 APIs

* Change presets to IK controller

* more cleanups + enabling evaluation worker + better logging

* RoboSuite_Lift_ClippedPPO updates

* Fix major bug in obs normalization filter setup

* Reduce coupling between Robosuite API and Coach environment

* Now only non task-specific parameters are explicitly defined
  in Coach
* Removed a bunch of enums of Robosuite elements, using simple
  strings instead
* With this change new environments/robots/controllers in Robosuite
  can be used immediately in Coach

* MAST: better logging of actor-trainer interaction + bug fixes + performance improvements.

Still missing: fixed pubsub for obs normalization running stats + logging for trainer signals

* lstm support for ppo

* setting JOINT VELOCITY action space by default + fix for EveryNEpisodes video dump filter + new TaskIDDumpFilter + allowing or between video dump filters

* Separate Robosuite clipped PPO preset for the non-MAST case

* Add flatten layer to architectures and use it in Robosuite presets

This is required for embedders that mix conv and dense

TODO: Add MXNet implementation

* publishing running_stats together with the published policy + hyper-param for when to publish a policy + cleanups

* bug-fix for memory leak in MAST

* Bugfix: Return value in TF BatchnormActivationDropout.to_tf_instance

* Explicit activations in embedder scheme so there's no ReLU after flatten

* Add clipped PPO heads with configurable dense layers at the beginning

* This is a workaround needed to mimic Surreal-PPO, where the CNN and
  LSTM are shared between actor and critic but the FC layers are not
  shared
* Added a "SchemeBuilder" class, currently only used for the new heads
  but we can change Middleware and Embedder implementations to use it
  as well

* Video dump setting fix in basic preset

* logging screen output to file

* coach to start the redis-server for a MAST run

* trainer drops off-policy data + old policy in ClippedPPO updates only after policy was published + logging free memory stats + actors check for a new policy only at the beginning of a new episode + fixed a bug where the trainer was logging "Training Reward = 0", causing dashboard to incorrectly display the signal

* Add missing set_internal_state function in TFSharedRunningStats

* Robosuite preset - use SingleLevelSelect instead of hard-coded level

* policy ID published directly on Redis

* Small fix when writing to log file

* Major bugfix in Robosuite presets - pass dense sizes to heads

* RoboSuite_Lift_ClippedPPO hyper-params update

* add horizon and value bootstrap to GAE calculation, fix A3C with LSTM

* adam hyper-params from mujoco

* updated MAST preset with IK_POSE_POS controller

* configurable initialization for policy stdev + custom extra noise per actor + logging of policy stdev to dashboard

* values loss weighting of 0.5

* minor fixes + presets

* bug-fix for MAST  where the old policy in the trainer had kept updating every training iter while it should only update after every policy publish

* bug-fix: reset_internal_state was not called by the trainer

* bug-fixes in the lstm flow + some hyper-param adjustments for CartPole_ClippedPPO_LSTM -> training and sometimes reaches 200

* adding back the horizon hyper-param - a messy commit

* another bug-fix missing from prev commit

* set control_freq=2 to match action_scale 0.125

* ClippedPPO with MAST cleanups and some preps for TD3 with MAST

* TD3 presets. RoboSuite_Lift_TD3 seems to work well with multi-process runs (-n 8)

* setting termination on collision to be on by default

* bug-fix following prev-prev commit

* initial cube exploration environment with TD3 commit

* bug fix + minor refactoring

* several parameter changes and RND debugging

* Robosuite Gym wrapper + Rename TD3_Random* -> Random*

* algorithm update

* Add RoboSuite v1 env + presets (to eventually replace non-v1 ones)

* Remove grasping presets, keep only V1 exp. presets (w/o V1 tag)

* Keep just robosuite V1 env as the 'robosuite_environment' module

* Exclude Robosuite and MAST presets from integration tests

* Exclude LSTM and MAST presets from golden tests

* Fix mistakenly removed import

* Revert debug changes in ReaderWriterLock

* Try another way to exclude LSTM/MAST golden tests

* Remove debug prints

* Remove PreDense heads, unused in the end

* Missed removing an instance of PreDense head

* Remove MAST, not required for this PR

* Undo unused concat option in ObservationStackingFilter

* Remove LSTM updates, not required in this PR

* Update README.md

* code changes for the exploration flow to work with robosuite master branch

* code cleanup + documentation

* jupyter tutorial for the goal-based exploration + scatter plot

* typo fix

* Update README.md

* seprate parameter for the obs-goal observation + small fixes

* code clarity fixes

* adjustment in tutorial 5

* Update tutorial

* Update tutorial

Co-authored-by: Guy Jacob <guy.jacob@intel.com>
Co-authored-by: Gal Leibovich <gal.leibovich@intel.com>
Co-authored-by: shadi.endrawis <sendrawi@aipg-ra-skx-03.ra.intel.com>
2021-06-01 00:34:19 +03:00

188 lines
6.2 KiB
Python

#
# Copyright (c) 2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import numpy as np
from robosuite.utils.mjcf_utils import CustomMaterial
from robosuite.environments.manipulation.single_arm_env import SingleArmEnv
from robosuite.environments.manipulation.lift import Lift
from robosuite.models.arenas import TableArena
from robosuite.models.objects import BoxObject
from robosuite.models.tasks import ManipulationTask
from robosuite.utils.placement_samplers import UniformRandomSampler
TABLE_TOP_SIZE = (0.84, 1.25, 0.05)
TABLE_OFFSET = (0, 0, 0.82)
class CubeExp(Lift):
"""
This class corresponds to multi-colored cube exploration for a single robot arm.
"""
def __init__(
self,
robots,
table_full_size=TABLE_TOP_SIZE,
table_offset=TABLE_OFFSET,
placement_initializer=None,
penalize_reward_on_collision=False,
end_episode_on_collision=False,
**kwargs
):
"""
Args:
robots (str or list of str): Specification for specific robot arm(s) to be instantiated within this env
(e.g: "Sawyer" would generate one arm; ["Panda", "Panda", "Sawyer"] would generate three robot arms)
Note: Must be a single single-arm robot!
table_full_size (3-tuple): x, y, and z dimensions of the table.
placement_initializer (ObjectPositionSampler instance): if provided, will
be used to place objects on every reset, else a UniformRandomSampler
is used by default.
Rest of kwargs follow Lift class arguments
"""
if placement_initializer is None:
placement_initializer = UniformRandomSampler(
name="ObjectSampler",
x_range=[0.0, 0.0],
y_range=[0.0, 0.0],
rotation=(0.0, 0.0),
ensure_object_boundary_in_range=False,
ensure_valid_placement=True,
reference_pos=table_offset,
z_offset=0.9,
)
super().__init__(
robots=robots,
table_full_size=table_full_size,
placement_initializer=placement_initializer,
initialization_noise=None,
**kwargs
)
self._max_episode_steps = self.horizon
def _load_model(self):
"""
Loads an xml model, puts it in self.model
"""
SingleArmEnv._load_model(self)
# Adjust base pose accordingly
xpos = self.robots[0].robot_model.base_xpos_offset["table"](self.table_full_size[0])
self.robots[0].robot_model.set_base_xpos(xpos)
# load model for table top workspace
mujoco_arena = TableArena(
table_full_size=self.table_full_size,
table_friction=self.table_friction,
table_offset=self.table_offset,
)
# Arena always gets set to zero origin
mujoco_arena.set_origin([0, 0, 0])
cube_material = self._get_cube_material()
self.cube = BoxObject(
name="cube",
size_min=(0.025, 0.025, 0.025),
size_max=(0.025, 0.025, 0.025),
rgba=[1, 0, 0, 1],
material=cube_material,
)
self.placement_initializer.reset()
self.placement_initializer.add_objects(self.cube)
# task includes arena, robot, and objects of interest
self.model = ManipulationTask(
mujoco_arena=mujoco_arena,
mujoco_robots=[robot.robot_model for robot in self.robots],
mujoco_objects=self.cube,
)
@property
def action_spec(self):
"""
Action space (low, high) for this environment
"""
low, high = super().action_spec
return low[:3], high[:3]
def _get_cube_material(self):
from robosuite.utils.mjcf_utils import array_to_string
rgba = (1, 0, 0, 1)
cube_material = CustomMaterial(
texture=rgba,
tex_name="solid",
mat_name="solid_mat",
)
cube_material.tex_attrib.pop('file')
cube_material.tex_attrib["type"] = "cube"
cube_material.tex_attrib["builtin"] = "flat"
cube_material.tex_attrib["rgb1"] = array_to_string(rgba[:3])
cube_material.tex_attrib["rgb2"] = array_to_string(rgba[:3])
cube_material.tex_attrib["width"] = "100"
cube_material.tex_attrib["height"] = "100"
return cube_material
def _reset_internal(self):
"""
Resets simulation internal configurations.
"""
from robosuite.utils.mjmod import Texture
super()._reset_internal()
self._action_dim = 3
geom_id = self.sim.model.geom_name2id('cube_g0_vis')
mat_id = self.sim.model.geom_matid[geom_id]
tex_id = self.sim.model.mat_texid[mat_id]
texture = Texture(self.sim.model, tex_id)
bitmap_to_set = texture.bitmap
bitmap = np.zeros_like(bitmap_to_set)
bitmap[:100, :, :] = 255
bitmap[100:200, :, 0] = 255
bitmap[200:300, :, 1] = 255
bitmap[300:400, :, 2] = 255
bitmap[400:500, :, :2] = 255
bitmap[500:, :, 1:] = 255
bitmap_to_set[:] = bitmap
for render_context in self.sim.render_contexts:
render_context.upload_texture(texture.id)
def _pre_action(self, action, policy_step=False):
""" explicitly shut the gripper """
joined_action = np.append(action, [0., 0., 0., 1.])
self._action_dim = 7
super()._pre_action(joined_action, policy_step)
def _post_action(self, action):
ret = super()._post_action(action)
self._action_dim = 3
return ret
def reward(self, action=None):
return 0
def _check_success(self):
return False