Skip to content

Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming

License

Notifications You must be signed in to change notification settings

Mgineer117/Meta-CPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meta-Constrained Policy Optimization (Meta-CPO) for Safe and Fast Adaptation for Nonstationary Domains.

This repository is an adaptation of the CPO algorithm, transforming it into a Meta-learning framework. The modification involves leveraging Differentiable Convex Programming to facilitate the relaxation of gradient computations between parameters. The integration of CPO into the meta-learning framework was achieved through the application of the model-free meta-framework introduced by MAML. The primary objective of this algorithm is to undergo testing within the Safety Gymnasium, offering an intuitive experimental platform to showcase its effectiveness in the context of Autonomous Driving tasks. For a detailed theory about achieving safety guarantee, please refer to our paper Constrained Meta-RL with DCO


Pre-requisites

Usage

To create a conda environment, run the following commands:

conda create --name myenv python==3.10.*
conda activate myenv

Then, install the required packages using pip:

pip install -r requirements.txt

In the code, we have already implemented testing domains within the safety_gymnasium folder specifically for the Button and Circle tasks to evaluate adaptive performance (it is in safety_gymnasium/tasks/safe_navigation/). If one wishes to evaluate and replicate in different environmental settings, they will need to implement their own custom environments and refer to the details in the Custom Environment section below. To conduct experiments, choose any desired agent and set the environment as either Safety[Agent]Stcircle or Safety[Agent]Stbutton by specifying the env_name in utils/apr_parse.py. Then, execute the following command with appropriate hyperparameter settings:

python3 main.py

Note

The following error may arise

Please consider re-formulating your problem so that it is always solvable or increasing the number of solver iterations.

This is attributed to the insolvable CPO problem that is natural since CPO sometimes does not have a solution where naive TRPO problem to the cost will be utilized.

Custom Environment

To create your custom environments, we refer to Safety Gym documentation, our code implementation in safety_gym folder safety_gymnasium/tasks/safe_navigation/, and Table 1 of our paper. In our implementation task_level_0 is used as a fixed environment to evaluate training performance, task_level_1 is to generate environments with stochastic environmental parameters, and task_level_2 is used to generate a meta_testing environment. Other minor changes may be required to adapt to its safety gym package.

Citing Meta-CPO

If you find Meta-CPO useful and informative, please cite it in your publications.

@inproceedings{cho2024constrained,
  title={Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming},
  author={Cho, Minjae and Sun, Chuangchuang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={19},
  pages={20975--20983},
  year={2024}
}

Simulation

PB_CPOMeta.mp4
PB_TRPOMeta.mp4
PB_CPO.mp4

Code Reference

About

Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages