This repository is the official implementation of Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety. The code base of this implementation is the Parallel Asynchronous Buffer-Actor-Learner (PABAL) architecture, which includes implementations of most common RL algorithms with the state-of-the-art training efficiency. If you are interested in or want to contribute to PABAL, you can contact me or the original creator.
Important information for installing the requirements:
- We test it successfully only on Python 3.6, and higher python version causes error with Safety Gym and TensorFlow 2.x.
- Make sure you have installed MuJoCo and mujoco-py properly.
- Safety Gym and TensorFlow 2.x have conflict in numpy version. We test on numpy 1.17.5. If it runs with errors, pls check the numpy version.
To install requirements:
pip install -r requirements.txt
To train the model(s) in the paper, run this command:
python --env_id Safexp-PointButton1-v0 --seed 0
To test and evaluate trained policies, run:
python --mode testing --test_dir <your_log_dir> --test_iter_list [3000000]
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with me before making a change.