EPyMARL is an extension of PyMARL, and includes
- New! Support for training in environments with individual rewards for all agents (for all algorithms that support such settings)
- New! Updated EPyMARL to use maintained Gymnasium library instead of deprecated OpenAI Gym version 0.21.
- New! Support for new environments: native integration of PettingZoo, VMAS, matrix games, SMACv2, and SMAClite
- New! Support for logging to weights and biases (W&B)
- New! We added a simple plotting script to visualise run data
- Additional algorithms (IA2C, IPPO, MADDPG, MAA2C and MAPPO)
- Option for no-parameter sharing between agents (original PyMARL only allowed for parameter sharing)
- Flexibility with extra implementation details (e.g. hard/soft updates, reward standarization, and more)
- Consistency of implementations between different algorithms (fair comparisons)
See our blog post here: https://agents.inf.ed.ac.uk/blog/epymarl/
It became increasingly difficult to install and rely on the deprecated OpenAI Gym version 0.21 EPyMARL previously depended on, so we moved EPyMARL to use the maintained Gymnasium library and API. This move required updating of several environments that were built to work with EPyMARL's gymma
wrapper, including level-based foraging and multi-robot warehouse. Alongside this update to EPyMARL, we therefore also updated these environments as well as SMAClite, matrix games, and wrote wrappers to maintain compatibility with SMAC and added integration for SMACv2. We hope these changes will simplify integration of new environments and ensure that EPyMARL remains usable for a longer time.
To use the legacy version of EPyMARL with OpenAI Gym version 0.21, please use the previous version v1.0.0
of EPyMARL.
For more information on how to install and run experiments in these environments, see the documentation here.
Previously EPyMARL only supported training of MARL algorithms in common-reward environments. To support environments which naturally provide individual rewards for agents (e.g. LBF and RWARE), we previously scalarised the rewards of all agents using a sum operation to obtain a single common reward that was then given to all agents. We are glad to announce that EPyMARL now supports training in general-sum reward environments (for all algorithms that are sound to train in general-sum reward settings)!
- Algorithms that support general-sum reward envs: IA2C, IPPO, MAA2C, MAPPO, IQL, PAC
- Algorithms that only support common-reward envs: COMA, VDN, QMIX, QTRAN
By default, EPyMARL runs experiments with common rewards (as done previously). To run an experiment with individual rewards for all agents, set common_reward=False
. For example to run MAPPO in a LBF task with individual rewards:
python src/main.py --config=mappo --env-config=gymma with env_args.time_limit=50 env_args.key="lbforaging:Foraging-8x8-2p-3f-v3" common_reward=False
When using the common_reward=True
setup in environments which naturally provide individual rewards, by default we scalarise the rewards into a common reward by summing up all rewards. This is now configurable and we support the mean operation as an alternative scalarisation. To use the mean scalarisation, set reward_scalarisation="mean"
.
We now support logging to W&B! To log data to W&B, you need to install the library with pip install wandb
and setup W&B (see their documentation). After, follow our instructions.
We have added a simple plotting script under plot_results.py
to load data from sacred logs and visualise them for executed experiments. For more details, see the documentation here.
We have released our Pareto Actor-Critic algorithm, accepted in TMLR, as part of the E-PyMARL source code.
Find the paper here: https://arxiv.org/abs/2209.14344
Pareto-AC (Pareto-AC), is an actor-critic algorithm that utilises a simple principle of no-conflict games (and, in turn, cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. Pareto-AC works especially well in environments with multiple suboptimal equilibria (a problem is also known as relative over-generalisation). We have seen impressive results in a diverse set of multi-agent games with suboptimal equilibria, including the matrix games of the MARL benchmark, but also LBF variations with high penalties.
PAC introduces additional dependencies specified in pac_requirements.txt
. To install its dependencies, run
pip install -r pac_requirements.txt
To run Pareto-AC in an environment, for example the Penalty game, you can run:
python src/main.py --config=pac_ns --env-config=gymma with env_args.time_limit=1 env_args.key=matrixgames:penalty-100-nostate-v0
- Extended Python MARL framework - EPyMARL
- Table of Contents
- Installation & Run instructions
- Experiment Configurations
- Run a hyperparameter search
- Logging
- Saving and loading learnt models
- Plotting
- Citing PyMARL and EPyMARL
- License
To install the dependencies for the codebase, clone this repo and run:
pip install -r requirements.txt
To install a set of supported environments, you can use the provided env_requirements.txt
:
pip install -r env_requirements.txt
which will install the following environments:
- Level Based Foraging
- Multi-Robot Warehouse
- PettingZoo (used for the multi-agent particle environment)
- VMAS
- Matrix games
- SMAC
- SMACv2
- SMAClite
To install these environments individually, please see instructions in the respective repositories. We note that in particular SMAC and SMACv2 require a StarCraft II installation with specific map files. See their documentation for more details.
Note that the PAC algorithm introduces separate dependencies. To install these dependencies, use the provided requirements file:
pip install -r pac_requirements.txt
In "Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks" we introduce the Level-Based Foraging (LBF) and Multi-Robot Warehouse (RWARE) environments, and additionally evaluate in SMAC, Multi-agent Particle environments, and a set of matrix games. After installing these environments (see instructions above), we can run experiments in these environments as follows:
Matrix games:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="matrixgames:penalty-100-nostate-v0"
LBF:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=50 env_args.key="lbforaging:Foraging-8x8-2p-3f-v3"
RWARE:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=500 env_args.key="rware:rware-tiny-2ag-v2"
MPE:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="pz-mpe-simple-spread-v3"
Note that for the MPE environments tag (predator-prey) and adversary, we provide pre-trained prey and adversary policies. These can be used to control the respective agents to make these tasks fully cooperative (used in the paper) by setting env_args.pretrained_wrapper="PretrainedTag"
or env_args.pretrained_wrapper="PretrainedAdversary"
.
SMAC:
python src/main.py --config=qmix --env-config=sc2 with env_args.map_name="3s5z"
Below, we provide the base environment and key / map name for all the environments evaluated in the "Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks":
- Matrix games: all with
--env-config=gymma with env_args.time_limit=25 env_args.key="..."
- Climbing:
matrixgames:climbing-nostate-v0
- Penalty
$k=0$ :matrixgames:penalty-0-nostate-v0
- Penalty
$k=-25$ :matrixgames:penalty-25-nostate-v0
- Penalty
$k=-50$ :matrixgames:penalty-50-nostate-v0
- Penalty
$k=-75$ :matrixgames:penalty-75-nostate-v0
- Penalty
$k=-100$ :matrixgames:penalty-100-nostate-v0
- Climbing:
- LBF: all with
--env-config=gymma with env_args.time_limit=50 env_args.key="..."
- 8x8-2p-2f-coop:
lbforaging:Foraging-8x8-2p-2f-coop-v3
- 8x8-2p-2f-2s-coop:
lbforaging:Foraging-2s-8x8-2p-2f-coop-v3
- 10x10-3p-3f:
lbforaging:Foraging-10x10-3p-3f-v3
- 10x10-3p-3f-2s:
lbforaging:Foraging-2s-10x10-3p-3f-v3
- 15x15-3p-5f:
lbforaging:Foraging-15x15-3p-5f-v3
- 15x15-4p-3f:
lbforaging:Foraging-15x15-4p-3f-v3
- 15x15-4p-5f:
lbforaging:Foraging-15x15-4p-5f-v3
- 8x8-2p-2f-coop:
- RWARE: all with
--env-config=gymma with env_args.time_limit=500 env_args.key="..."
- tiny 2p:
rware:rware-tiny-2ag-v2
- tiny 4p:
rware:rware-tiny-4ag-v2
- small 4p:
rware:rware-small-4ag-v2
- tiny 2p:
- MPE: all with
--env-config=gymma with env_args.time_limit=25 env_args.key="..."
- simple speaker listener:
pz-mpe-simple-speaker-listener-v4
- simple spread:
pz-mpe-simple-spread-v3
- simple adversary:
pz-mpe-simple-adversary-v3
with additionalenv_args.pretrained_wrapper="PretrainedAdversary"
- simple tag:
pz-mpe-simple-tag-v3
with additionalenv_args.pretrained_wrapper="PretrainedTag"
- simple speaker listener:
- SMAC: all with
--env-config=sc2 with env_args.map_name="..."
- 2s_vs_1sc:
2s_vs_1sc
- 3s5z:
3s5z
- corridor:
corridor
- MMM2:
MMM2
- 3s_vs_5z:
3s_vs_5z
- 2s_vs_1sc:
EPyMARL now supports the new SMACv2 and SMAClite environments. We provide wrappers to integrate these environments into the Gymnasium interface of EPyMARL. To run experiments in these environments, you can use the following exemplary commands:
SMACv2:
python src/main.py --config=qmix --env-config=sc2v2 with env_args.map_name="protoss_5_vs_5"
We provide prepared configs for a range of SMACv2 scenarios, as described in the SMACv2 repository, under src/config/envs/smacv2_configs
. These can be run by providing the name of the config file as the env_args.map_name
argument. To define a new scenario, you can create a new config file in the same format as the provided ones and provide its name as the env_args.map_name
argument.
SMAClite:
python src/main.py --config=qmix --env-config=smaclite with env_args.time_limit=150 env_args.map_name="MMM"
By default, SMAClite uses a numpy implementation of the RVO2 library for collision avoidance. To instead use a faster optimised C++ RVO2 library, follow the instructions of this repo and provide the additional argument env_args.use_cpp_rvo2=True
.
EPyMARL supports the PettingZoo and VMAS libraries for multi-agent environments using wrappers. To run experiments in these environments, you can use the following exemplary commands:
PettingZoo:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=25 env_args.key="pz-mpe-simple-spread-v3"
VMAS:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=150 env_args.key="vmas-balance"
EPyMARL supports environments that have been registered with Gymnasium. If you would like to use any other Gymnasium environment, you can do so by using the gymma
environment with the env_args.key
argument being provided with the registration ID of the environment. Environments can either provide a single scalar reward to run common reward experiments (common_reward=True
), or should provide one environment per agent to run experiments with individual rewards (common_reward=False
) or with common rewards using some reward scalarisation (see documentation for more details).
To register a custom environment with Gymnasium, use the template below:
from gymnasium import register
register(
id="my-environment-v1", # Environment ID.
entry_point="myenv.environment:MyEnvironment", # The entry point for the environment class
kwargs={
... # Arguments that go to MyEnvironment's __init__ function.
},
)
After, you can run an experiment in this environment using the following command:
python src/main.py --config=qmix --env-config=gymma with env_args.time_limit=50 env_args.key="myenv:my-environment-v1"
assuming that the environment is registered with the ID my-environment-v1
in the installed library myenv
.
EPyMARL defines yaml configuration files for algorithms and environments under src/config
. src/config/default.yaml
defines default values for a range of configuration options, including experiment information (t_max
for number of timesteps of training etc.) and algorithm hyperparameters.
Further environment configs (provided to the main script via --env-config=...
) can be found in src/config/envs
. Algorithm configs specifying algorithms and their hyperparameters (provided to the main script via --config=...
) can be found in src/config/algs
. To change hyperparameters or define a new algorithm, you can modify these yaml config files or create new ones.
We include a script named search.py
which reads a search configuration file (e.g. the included search.config.example.yaml
) and runs a hyperparameter search in one or more tasks. The script can be run using
python search.py run --config=search.config.example.yaml --seeds 5 locally
In a cluster environment where one run should go to a single process, it can also be called in a batch script like:
python search.py run --config=search.config.example.yaml --seeds 5 single 1
where the 1 is an index to the particular hyperparameter configuration and can take values from 1 to the number of different combinations.
By default, EPyMARL will use sacred to log results and models to the results
directory. These logs include configuration files, a json of all metrics, a txt file of all outputs and more. Additionally, EPyMARL can log data to tensorboard files by setting use_tensorboard: True
in the yaml config. We also added support to log data to weights and biases (W&B) with instructions below.
First, make sure to install W&B and follow their instructions to authenticate and setup your W&B library (see the quickstart guide for more details).
To tell EPyMARL to log data to W&B, you then need to specify the following parameters in your configuration:
use_wandb: True # Log results to W&B
wandb_team: null # W&B team name
wandb_project: null # W&B project name
to specify the team and project you wish to log to within your account, and set use_wandb=True
. By default, we log all W&B runs in "offline" mode, i.e. the data will only be stored locally and can be uploaded to your W&B account via wandb sync ...
. To directly log runs online, please specify wandb_mode="online"
within the config.
We also support logging all stored models directly to W&B so you can download and inspect these from the W&B online dashboard. To do so, use the following config parameters:
wandb_save_model: True # Save models to W&B (only done if use_wandb is True and save_model is True)
save_model: True # Save the models to disk
save_model_interval: 50000
Note that models are only saved in general if save_model=True
and to further log them to W&B you need to specify use_wandb
, wandb_team
, wandb_project
, and wandb_save_model=True
.
You can save the learnt models to disk by setting save_model = True
, which is set to False
by default. The frequency of saving models can be adjusted using save_model_interval
configuration. Models will be saved in the result directory, under the folder called models. The directory corresponding each run will contain models saved throughout the experiment, each within a folder corresponding to the number of timesteps passed since starting the learning process.
Learnt models can be loaded using the checkpoint_path
and load_step
parameters. checkpoint_path
should point to a directory stored for a run by epymarl as stated above. The pointed-to directory should contain sub-directories for various timesteps at which checkpoints were stored. If load_step
is not provided (by default load_step=0
) then the last checkpoint of the pointed-to run is loaded. Otherwise the checkpoint of the closest timestep to load_step
will be loaded. After loading, the learning will proceed from the corresponding timestep.
To only evaluate loaded models without any training, set the checkpoint_path
and load_step
parameters accordingly for the loading, and additionally set evaluate=True
. Then, the loaded checkpoint will be evaluated for test_nepisode
episodes before terminating the run.
The plotting script provided as plot_results.py
supports plotting of any logged metric, can apply simple window-smoothing, aggregates results across multiple runs of the same algorithm, and can filter which results to plot based on algorithm and environment names.
If multiple configs of the same algorithm exist within the loaded data and you only want to plot the best config per algorithm, then add the --best_per_alg
argument! If this argument is not set, the script will visualise all configs of each (filtered) algorithm and show the values of the hyperparameter config that differ across all present configs in the legend.
The Extended PyMARL (EPyMARL) codebase was used in Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks.
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, & Stefano V. Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021
In BibTeX format:
@inproceedings{papoudakis2021benchmarking,
title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
author={Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2021},
url = {http://arxiv.org/abs/2006.07869},
openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
code = {https://github.com/uoe-agents/epymarl},
}
If you use the original PyMARL in your research, please cite the SMAC paper.
M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquhar, N. Nardelli, T.G.J. Rudner, C.-M. Hung, P.H.S. Torr, J. Foerster, S. Whiteson. The StarCraft Multi-Agent Challenge, CoRR abs/1902.04043, 2019.
In BibTeX format:
@article{samvelyan19smac,
title = {{The} {StarCraft} {Multi}-{Agent} {Challenge}},
author = {Mikayel Samvelyan and Tabish Rashid and Christian Schroeder de Witt and Gregory Farquhar and Nantas Nardelli and Tim G. J. Rudner and Chia-Man Hung and Philiph H. S. Torr and Jakob Foerster and Shimon Whiteson},
journal = {CoRR},
volume = {abs/1902.04043},
year = {2019},
}
All the source code that has been taken from the PyMARL repository was licensed (and remains so) under the Apache License v2.0 (included in LICENSE
file).
Any new code is also licensed under the Apache License v2.0