This is the official repository for Confidential-Proffit: Confidential PROof of FaIr Training of Trees, a work published in the International Conference on Learning Representations (ICLR), 2023, in Kigali, Rwanda.
The results for this project are split into two subdirectories. The effectiveness directory contains code for testing the fairness and accuracy of fairness-aware tree training. Efficiency directory contains code for evaluating the speed and cost of Confidential-PROFITT.
Zero-knowledge proof of fair training for decision trees.
Benchmarks in the paper were run using separate Amazon EC2 instances for the prover and verifier. They can also be run locally, but better performance is achieved with two instances (since the computational cost of initialising the ZK proof framework is distributed among two machines).
- Install emp-tool with the following command:
wget https://raw.githubusercontent.com/emp-toolkit/emp-readme/master/scripts/install.py && python3 install.py --install --tool --ot --zk --deps
- Inside the working directory for this code (here on referred to as
cpof-main
) make auxilliary directories and compile:
cd cpof-main; mkdir data && mkdir experiments && mkdir temp && cmake . && make
- Configure
test/experiment_config.py
by replacing/PATH/TO/THISDIR/...
with the path to this directory on your machine. - If desired, use this script to throttle the connection to LAN parameters
wget https://raw.githubusercontent.com/emp-toolkit/emp-readme/master/scripts/throttle.py
python3 throttle.py -i ens5 -b 1000 -l 1
- Set parameters in
cpof/constant.py
and compile usingcmake . && make
. For the Demographic Parity version of the algorithm, setWHICH_ALG=1
and for Equalized Odds setWHICH_ALG=3
. - Run benchmark for zero-knowledge proof of fair decision tree training. To run it locally, use the following command:
./run bin/test_benchmark
To run it on separate machines, use these respective commands on 2 machines
./bin/test_benchmark_ONLINE 1 12345 <IP of machine 2>
./bin/test_benchmark_ONLINE 2 12345 <IP of machine 1>
To obtain the figures in the paper, we perform 5 runs for each parameter setting and report the median.
- Run benchmark for zero-knowledge proof of fair random forest training. On two separate machines, use the following commands:
./bin/test_benchmark_RF_ONLINE 1 12345 <IP of machine 2>
./bin/test_benchmark_RF_ONLINE 2 12345 <IP of machine 1>
To obtain the figures in the paper, we perform 5 runs for each parameter setting and report the median.
Fair tree training using Gini impurity metrics.
Download and place datasets into src/main/resources/<dataset>_<seed number>.csv
for tree training.
The code is Java (tested on Java 14) and requires maven (tested on 3.8, installation instructions here). The scripts in sbatch_scripts/
compile to a targets/
directory.
The plotting scripts and notebooks require python with libraries from requirements.txt
, please install with:
pip install -r requirements.txt
Alter categories in sbatch_scripts/run.sh
for various datasets, train/test results, Gini fairness formulations, etc. sbatch_scripts/run_baselines.sh
gives decision trees or random forests trained without fairness.
- These categories, along with the
date
will create raw java files inraw_txts
. This step occurs for both therun.sh
andbaseline_run.sh
(the latter of which trains trees without fairness criteria). - The script will read these into
pandas.DataFrames
stored in theresults
directory under the appropriate dataset, fairness condition, date, number of seeds, and GINI formulation. These are saved as.csv
files. This step is included in both of the aforementioned scripts. - These csvs can then be plotted. This step is included in the
run.sh
script and also available in a separateplotting.sh
script if you'd only like to plot. Plots are generated in the same directory as their csvs in a<csv dir here>/plots/
subdirectory. - Other plots (the ones in the paper) are computed in Jupyter notebooks (in
notebooks/
). Accuracy vs unfairness plots are inacc_v_fair.ipynb
and IGS (unfairness info gain) plotting inIGS_plotting.ipynb
This portion of the code builds on Fair-Forest by James Fantin which is an extension of a paper: "Fair forests: Regularized tree induction to minimize model bias." Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018. This code also extends the JSAT (Java Statistical Analysis Tool) by Edward Raff (paper, repo) under the GPL-3.0 License.
- Ali Shahin Shamsabadi
- Sierra Wyllie
- Nicholas Franzese
- Natalie Dullerud
- Sébastien Gambs
- Nicolas Papernot
- Xiao Wang
- Adrian Weller
@inproceedings{
shamsabadi2023confidentialprofitt,
title={Confidential-{PROFITT}: Confidential {PRO}of of FaIr Training of Trees},
author={Ali Shahin Shamsabadi and Sierra Calanda Wyllie and Nicholas Franzese and Natalie Dullerud and S{\'e}bastien Gambs and Nicolas Papernot and Xiao Wang and Adrian Weller},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=iIfDQVyuFD}
}
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.