This repository is the reference implementation of the paper "Multi-agent active perception with prediction rewards".
Some further information is available in this blog post.
If you find the work useful, please cite it as: Mikko Lauri and Frans A. Oliehoek. "Multi-agent active perception with prediction rewards", in Advances in Neural Information Processing Systems 33, 2020.
BiBTeX entry:
@inproceedings{lauri2020multiagent,
author = {Mikko Lauri and Frans A. Oliehoek},
title = {Multi-agent active perception with prediction rewards},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2020}
}
The code consists of a C++ backend for solving Dec-POMDPs and a Python frontend that implements the APAS algorithm presented in the paper. Follow the steps below to install necessary requirements and compile the planner.
Install the required system libraries on a Ubuntu system by:
sudo apt-get install libboost-all-dev libeigen3-dev
Additionally, you need a C++ compiler that supports C++17, and CMake version 3.0 or later.
You can compile the C++ backend by executing:
cd solver && mkdir build && cd build
cmake ..
make
Note: this will download and compile MADP toolbox version 0.4.1 which usually takes quite a long time.
If you already have MADP installed, you can save a lot of time by specifying where to find it: cmake .. -DMADPPATH=/path/to/your/madp/installation
.
Use Python3.
Only numpy
is required.
You probably have it, or you can run:
pip install -r requirements.txt
You can solve the MAV domain with horizon 5 using the experimental settings from the paper by running:
python apas.py --horizon 5 `pwd`/problems/mav.dpomdp --verbose
We toggled verbose output to get some printouts in the terminal.
Results will be stored in the subfolder results
. There you will find the following contents:
apas_policy.out
indicates where to find the best individual policies found by APAS for each agentapas_value.npy
a file that can be loaded usingnp.load
containng the value of the best policy found by APASbeliefs_XYZ.txt
text files containing on each row a belief state used as linearization point for the final reward at iterationXYZ
of APASpolicy_values.npy
loadable numpy file with the value of the policy foudn at each iteration of APAS- Subfolders
pgi_XY
containing the best individual policies and all individual policies considered by policy graph improvement for each agent at iterationXY
of APAS. The files are in.dot
format and can be visualized usingxdot
.
All values stored are exact. The approximation by a piecewise linear function is not used when evaluating the policies, it is only used when planning.
The archives linked below contain the raw data corresponding to the results presented in the paper and supplementary material. The format is similar to that described above.
The software uses the parser from the MADP toolbox to read problems formatted as .dpomdp
files.
You can specify your own problems in this format.
See this example problem for a description of the format.
However, note that the .dpomdp
format does not allow specifying rewards that are not linear in the belief state (i.e., functions of the hidden state and actions).
The planner software implicitly assumes you wish to solve a Dec-rhoPOMDP with negative entropy as the final reward.
If you want to use a different final reward, modify DecPOMDPConversions.hpp
.
You will need to add functionality for getting the linearizing hyperplanes of your (convex and bounded) final reward function; see LinearizedNegEntropy.hpp
for an example.
The conversion from Definition 3 in the paper is implemented in DecPOMDPConversions.hpp
.
The main part of the Dec-POMDP solver is implemented in BackwardPass.hpp
.
We use particle-based PGI, however modified with UCB1 applied to optimize node configurations.
Pull requests are welcome, although there are no plans for active further development as of now.
Licensed under the MIT license - see LICENSE for details.