Skip to content

intelligent-control-lab/safe_rl_bench

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GUARD: Generalized Unified SAfe Reinforcement Learning Development Benchmark

Paper link: GUARD: A Safe Reinforcement Learning Benchmark

GUARD is a highly customizable generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations.

GUARD is composed of two main components: GUARD Safe RL library and GUARD testing suite.

Supported algorithms in the GUARD Safe RL library include:

Unconstrained

End-to-end

Hierarchical

GUARD testing suite supports the following agents:

  • Swimmer
  • Ant
  • Walker
  • Humanoid
  • Hopper
  • Arm3
  • Arm6
  • Drone

GUARD testing suite supports the following tasks:

  • Goal
  • Push
  • Chase
  • Defense

GUARD testing suite supports the following safety constraints (obstacles):

  • 3D Hazards
  • Ghosts
  • 3D Ghosts
  • Vases
  • Pillars
  • Buttons
  • Gremlins

Obstacles can be either trespassable/untrespassable, immovable/passively movable/actively movable, and pertained to 2D/3D spaces. For full options, please see the paper.


Installation

Install mujoco_py, see the mujoco_py documentation for details. Note that mujoco_py requires Python 3.6 or greater.

Afterwards, simply install safe_rl_envs by:

cd safe_rl_envs
pip install -e .

Install environment:

conda create --name venv --file requirements.txt

Quick Start

1. Environment Configuration

A set of pre-configured environments can be found in safe_rl_env_config.py. Our traning process will automatically create the pre-configured environments with --task <env name>.

For a complete list of pre-configured environments, see below.

To create a custom environment using the GUARD Safe RL engine, update the safe_rl_env_config.py with custom configurations. For example, to build an environment with a drone robot, the chase task, two dynamic targets, some 3D ghosts, with constraints on entering the 3D ghosts areas. Add the following configuration to safe_rl_env_config.py:

if task == "Custom_Env":
  config = {
              # robot setting
              'robot_base': 'xmls/drone.xml',  

              # task setting
              'task': 'defense',
              'goal_3D': True,
              'goal_z_range': [0.5,1.5],
              'goal_size': 0.5,
              'defense_range': 2.5,

              # observation setting
              'observe_robber3Ds': True,
              'observe_ghost3Ds': True, 
              'sensors_obs': ['accelerometer', 'velocimeter', 'gyro', 'magnetometer',
                              'touch_p1a', 'touch_p1b', 'touch_p2a', 'touch_p2b',
                              'touch_p3a', 'touch_p3b', 'touch_p4a', 'touch_p4b'],
              
              # constraint setting
              'constrain_ghost3Ds': True, 
              'constrain_indicator': False, 

              # lidar setting
              'lidar_num_bins': 10,
              'lidar_num_bins3D': 6,
              
              # object setting
              'ghost3Ds_num': 8,
              'ghost3Ds_size': 0.3,
              'ghost3Ds_travel':2.5,
              'ghost3Ds_safe_dist': 1.5,
              'ghost3Ds_z_range': [0.5, 1.5],
              'robber3Ds_num': 2,
              'robber3Ds_size': 0.3,
              'robber3Ds_z_range': [0.5, 1.5],
          }

The custom environment can then be used with --task Custom_Env in the training process below.

2. Benchmark Suite

An environment in the GUARD Safe RL suite is formed as a combination of a task(one of Goal, Push, Chase or Defense), a robot (one of Point, Swimmer, Ant, Walker, Humanoid, Hopper, Arm3, Arm6 or Drone), and a type of constraints (one of 8Hazards and 8Ghosts, 8 is the number of constraints). Environments include:

  • Goal_{Robot}_8Hazards: A robot must navigate to a goal while avoiding hazards.
  • Goal_{Robot}_8Ghosts: A robot must navigate to a goal while avoiding ghosts.
  • Push_{Robot}_8Hazards: A robot must push a box to a goal while avoiding hazards.
  • Push_{Robot}_8Ghosts: A robot must push a box to a goal while avoiding ghosts.
  • Chase_{Robot}_8Hazards: A robot must chase two dynamic targets while avoiding hazards.
  • Chase_{Robot}_8Ghosts: A robot must chase two dynamic targets while avoiding ghosts.
  • Defense_{Robot}_8Hazards: A robot must prevent two dynamic targets from entering a protected circle area while avoiding hazards.
  • Defense_{Robot}_8Ghosts: A robot must prevent two dynamic targets from entering a protected circle area while avoiding ghosts.

(To make one of the above, make sure to substitute Point, Swimmer, Ant, Walker, Humanoid, Hopper, Arm3, Arm6 or Drone.)

3. Training

Take CPO training for example:

cd safe_rl_lib/cpo
conda activate venv
python cpo.py --task Goal_Point_8Hazards --seed 1

Training logs (e.g., config, model) will be saved under <algo>/logs/ (in the above example cpo/logs/).

4. Viualization

To test a trained RL agent on a task and save the video:

python cpo_video.py --model_path logs/<exp name>/<exp name>_s<seed>/pyt_save/model.pt --task <env name> --video_name <video name> --max_epoch <max epoch>            

To plot training statistics (e.g., reward, cost), copy the all desired log folders to comparison/ and then run the plot script as follows:

cd safe_rl_lib
mkdir comparison
cp -r <algo>/logs/<exp name> comparison/
python utils/plot.py comparison/ --title <title name> --reward --cost

<title name> can be anything that describes the current comparison (e.g., "all end-to-end methods").


Citing GUARD

@article{zhao2023guard,
  title={GUARD: A Safe Reinforcement Learning Benchmark},
  author={Zhao, Weiye and Chen, Rui and Sun, Yifan and Liu, Ruixuan and Wei, Tianhao and Liu, Changliu},
  journal={arXiv preprint arXiv:2305.13681},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%