This repository contains the source code and related information for the binarized neural networks (BNNs) verification datasets submitted to the Max-SAT Evaluation 2020 and MIPLIB 2024.
$ pip3 install -r requirements.txt
Trained model weights are included in the models/
directory:
Use those models to get the same problem instances we submitted to the Max-SAT Evaluation 2020.
models/Verifying_Properties_of_Binarized_Deep_Neural_Networks.ipynb is the notebook we used for training those models on Google Colaboratory. Note, however, that this code is older than the other codes in this repository and should be used with caution.
Run the following to generate the same data set submitted to Max-SAT Evaluation 2020:
$ python3 generate_maxsat_instances.py --dataset mnist --model models/mnist.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2
$ python3 generate_maxsat_instances.py --dataset mnist_rot --model models/mnist_rot.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2
$ python3 generate_maxsat_instances.py --dataset mnist_back_image --model models/mnist_back_image.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2
You can also specify an individual sample by using --instance-no
instead of --instances-per-class
.
You can use generate_mip_instances.py
instead:
$ python3 generate_mip_instances.py --dataset mnist --model models/mnist.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2
$ python3 generate_mip_instances.py --dataset mnist_rot --model models/mnist_rot.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2
$ python3 generate_mip_instances.py --dataset mnist_back_image --model models/mnist_back_image.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2
Once the solver successfully solves a problem instance, you can check the solution as follows:
$ python3 verify_solution.py --dataset mnist --instance 7 \
--output-image perturbated.png \
--output-orig-image orig.png \
--format maxsat \
SOLUTION_FILE
This converts the solution in the SOLUTION_FILE
to an image file named perturbated.png
,
and also provides some information:
- model's prediction (probability distribution over the digit classes and predicted class) on the original image and the perturbated image, and
- the norms of the perturbation.
You need to provide a dataset and an instance number equal to the ones
used to generate the problem. If you generated a problem using
--instances-per-class
, you can find the instance number from the
filename.
--format
specify the format of SOLUTION_FILE
:
--format maxsat
:SOLUTION_FILE
is a file containing the output of Max-SAT solver (onlyv
lines are used). The logs/maxino-pref-fixed/ directory contains log files that you can try.--format gurobi
:SOLUTION_FILE
is a solution file of Gurobi. The miplib2024_submission/additional_files/solutions/ directory contains some examples.
This is the case for bnn_mnist_rot_10_label4_adversarial_norm_inf_totalizer.wcnf
.
Added perturbation:
- L0-norm: 18.0
- L1-norm: 18.0
- L2-norm: 4.242640687119285
- L∞-norm: 1.0
- Description
- Submitted instances (29.62 GB)
The competition results and organizer's slides are available on the competition website.
Among submitted 60 instance, 5 instances (maxsat2020_bnn_verification_used.tar.gz, 2.5 GB) were used in the competition:
Solving time (in seconds; 3600.0 means timeout):
Instance | maxino-pref | maxino | Pacose | UWrMaxSat | MaxHS | QMaxSAT | RC2-B / RC2-A / smax-minisat / smax-mergesat |
---|---|---|---|---|---|---|---|
… | 270.62 | 269.06 | 402.17 | 648.45 | 991.52 | 141.42 | 3600.0 |
… | 279.84 | 277.76 | 1101.24 | 795.81 | 1733.77 | 1729.06 | 3600.0 |
… | 367.28 | 367.06 | 221.87 | 657.69 | 1006.6 | 704.83 | 3600.0 |
… | 84.87 | 84.06 | 347.71 | 588.25 | 1083.57 | 3600.0 | 3600.0 |
… | 2215.51 | 2232.61 | 3600.0 | 3600.0 | 3600.0 | 3600.0 | 3600.0 |
Optimum values and solution examples generated by maxino-pref-fixed
:
Instance | Minimum ǁτǁ∞ | Original Image | Predicted Label | Perturbated Image† | Predicted Label |
---|---|---|---|---|---|
… | 1 | 9 | 5 | ||
… | 2 | 3 | 8 | ||
… | 1 | 5 | 7 | ||
… | 1 | 1 | 3 | ||
… | 4 | 5 | 3 |
†: These are obtained by executing maxino-pref-fixed
locally, so they may differ from those obtained during the contest.
- Q: In several samples used in the contest, the images do not look like the numbers shown as their labels.
- A: This problem was caused by my misunderstanding of the order of the features in
MNIST-rot
andMNIST-back-image
datasets (MNIST
does not have this problem). Thereby images were rotated and flipped from their original form. The features should have been reordered in the preprocessing during dataset creation. However, this is a visualization-only issue, since training and inference treat data in a consistent manner.
- A: This problem was caused by my misunderstanding of the order of the features in
- Q: What happens if two classes have the same maximum logit value?
- A: It is common to return a class with the smallest index in actual implementations (e.g. numpy.argmax and torch.argmax). However, safety property relying on such an assumption is not robust (in particular in the case of floating point numbers). Therefore, we didn't specify which of the maximum-logit classes is chosen and allowed one of them to be chosen non-determinically, in a way similar to how unspecified behavior is modeled with non-determinism in model checking safety properties. In other words, we are checking whether it is possible for an incorrect class to have a logit value at least as large as the one for the correct class.
- Q: Are there images of successfully perturbated cases?
- A: I added several examples above.
- Q: You said that using sequential counters produced a much larger file than using the totalizer. However, their sizes should be close both theoretically and empirically.
-
A: Since I didn't keep the results of experiments with sequential counters, I re-run the experiment using the sequential counter. The result for the 7th data in the MNIST test dataset (label = 9) in the case of L∞ norm is as follows:
encoding file size (uncompressed file) #variables #constraints Download sequential counters 5,076,826,688 B 64,168,245 192,424,087 bnn_mnist_7_label9_adversarial_norm_inf_sequential.wcnf.zst totalizer 3,871,189,890 B 1,824,676 132,670,200 bnn_mnist_7_label9_adversarial_norm_inf_totalizer.wcnf.zst parallel counters 94,898,008 B 516,006 3,227,479 bnn_mnist_7_label9_adversarial_norm_inf_parallel.wcnf.zst (PB constraints in WBO file) 3,215,049 B 1,452 2,191 bnn_mnist_7_label9_adversarial_norm_inf.wbo.zst The file size and the number of constraints are certainly larger in the sequential counters case, but the difference is not that much (although there is a huge difference in the number of variables). Therefore, it appears that I misremembered the results.
-
See miplib2024_submission/ directory.