Skip to content

BNN verification dataset for Max-SAT Evaluation 2020 and MIPLIB 2024

License

Notifications You must be signed in to change notification settings

msakai/bnn-verification

Repository files navigation

BNN verification dataset

This repository contains the source code and related information for the binarized neural networks (BNNs) verification datasets submitted to the Max-SAT Evaluation 2020 and MIPLIB 2024.

Usage

Preparation

$ pip3 install -r requirements.txt

(Optional) Training model

Trained model weights are included in the models/ directory:

Use those models to get the same problem instances we submitted to the Max-SAT Evaluation 2020.

models/Verifying_Properties_of_Binarized_Deep_Neural_Networks.ipynb is the notebook we used for training those models on Google Colaboratory. Note, however, that this code is older than the other codes in this repository and should be used with caution.

Max-SAT instance generation

Run the following to generate the same data set submitted to Max-SAT Evaluation 2020:

$ python3 generate_maxsat_instances.py --dataset mnist --model models/mnist.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2
$ python3 generate_maxsat_instances.py --dataset mnist_rot --model models/mnist_rot.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2
$ python3 generate_maxsat_instances.py --dataset mnist_back_image --model models/mnist_back_image.npz -o outdir \
--format wcnf --card totalizer --norm inf --target adversarial --instances-per-class 2

You can also specify an individual sample by using --instance-no instead of --instances-per-class.

MIP instance generation

You can use generate_mip_instances.py instead:

$ python3 generate_mip_instances.py --dataset mnist --model models/mnist.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2
$ python3 generate_mip_instances.py --dataset mnist_rot --model models/mnist_rot.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2
$ python3 generate_mip_instances.py --dataset mnist_back_image --model models/mnist_back_image.npz -o outdir \
--norm inf --target adversarial --instances-per-class 2

Validating solutions

Once the solver successfully solves a problem instance, you can check the solution as follows:

$ python3 verify_solution.py --dataset mnist --instance 7 \
  --output-image perturbated.png \
  --output-orig-image orig.png \
  --format maxsat \
  SOLUTION_FILE

This converts the solution in the SOLUTION_FILE to an image file named perturbated.png, and also provides some information:

  • model's prediction (probability distribution over the digit classes and predicted class) on the original image and the perturbated image, and
  • the norms of the perturbation.

You need to provide a dataset and an instance number equal to the ones used to generate the problem. If you generated a problem using --instances-per-class, you can find the instance number from the filename.

--format specify the format of SOLUTION_FILE:

Example result

This is the case for bnn_mnist_rot_10_label4_adversarial_norm_inf_totalizer.wcnf.

Image Prediction of a model P(y=0)
(logit)
P(y=1)
(logit)
P(y=2)
(logit)
P(y=3)
(logit)
P(y=4)
(logit)
P(y=5)
(logit)
P(y=6)
(logit)
P(y=7)
(logit)
P(y=8)
(logit)
P(y=9)
(logit)
Original image original 4 3.1416737e-14
(8.883254)
5.5133663e-22
(-8.975005)
1.2148612e-05
(28.656395)
7.593513e-20
(-4.049718)
0.9997013
(39.974392)
1.711211e-12
(12.88087)
3.8705436e-10
(18.302235)
0.00028651825
(31.816982)
5.633235e-12
(14.072353)
4.0916482e-11
(16.055202)
Perturbated image perturbated 6 4.5545687e-10
(12.883254)
2.6813108e-21
(-12.975005)
0.0032257813
(28.656395)
1.7916893e-10
(11.950282)
0.0016309624
(27.97439)
0.004037595
(28.880869)
0.91325474
(34.302235)
0.07607825
(31.816982)
4.4588405e-06
(22.072353)
0.0017682364
(28.055202)

Added perturbation:

  • L0-norm: 18.0
  • L1-norm: 18.0
  • L2-norm: 4.242640687119285
  • L-norm: 1.0

Max-SAT evaluation 2020

Submission to Max-SAT evaluation 2020

Result in the Max-SAT Evaluation 2020

The competition results and organizer's slides are available on the competition website.

Among submitted 60 instance, 5 instances (maxsat2020_bnn_verification_used.tar.gz, 2.5 GB) were used in the competition:

Instance Image Label
bnn_mnist_7_label9_adversarial_norm_inf_totalizer.wcnf.gz 9
bnn_mnist_back_image_32_label3_adversarial_norm_inf_totalizer.wcnf.gz 3
bnn_mnist_rot_16_label5_adversarial_norm_inf_totalizer.wcnf.gz 5
bnn_mnist_rot_8_label1_adversarial_norm_inf_totalizer.wcnf.gz 1
bnn_mnist_back_image_73_label5_adversarial_norm_inf_totalizer.wcnf.gz 5

Solving time (in seconds; 3600.0 means timeout):

Instance maxino-pref maxino Pacose UWrMaxSat MaxHS QMaxSAT RC2-B / RC2-A /
smax-minisat / smax-mergesat
270.62 269.06 402.17 648.45 991.52 141.42 3600.0
279.84 277.76 1101.24 795.81 1733.77 1729.06 3600.0
367.28 367.06 221.87 657.69 1006.6 704.83 3600.0
84.87 84.06 347.71 588.25 1083.57 3600.0 3600.0
2215.51 2232.61 3600.0 3600.0 3600.0 3600.0 3600.0

Optimum values and solution examples generated by maxino-pref-fixed:

Instance Minimum ǁτǁ Original Image Predicted Label Perturbated Image Predicted Label
1 9 5
2 3 8
1 5 7
1 1 3
4 5 3

†: These are obtained by executing maxino-pref-fixed locally, so they may differ from those obtained during the contest.

Talk at NII Shonan Meeting No. 180 “The Art of SAT”

Some follow-ups

  • Q: In several samples used in the contest, the images do not look like the numbers shown as their labels.
    • A: This problem was caused by my misunderstanding of the order of the features in MNIST-rot and MNIST-back-image datasets (MNIST does not have this problem). Thereby images were rotated and flipped from their original form. The features should have been reordered in the preprocessing during dataset creation. However, this is a visualization-only issue, since training and inference treat data in a consistent manner.
  • Q: What happens if two classes have the same maximum logit value?
    • A: It is common to return a class with the smallest index in actual implementations (e.g. numpy.argmax and torch.argmax). However, safety property relying on such an assumption is not robust (in particular in the case of floating point numbers). Therefore, we didn't specify which of the maximum-logit classes is chosen and allowed one of them to be chosen non-determinically, in a way similar to how unspecified behavior is modeled with non-determinism in model checking safety properties. In other words, we are checking whether it is possible for an incorrect class to have a logit value at least as large as the one for the correct class.
  • Q: Are there images of successfully perturbated cases?
    • A: I added several examples above.
  • Q: You said that using sequential counters produced a much larger file than using the totalizer. However, their sizes should be close both theoretically and empirically.

MIPLIB 2024

See miplib2024_submission/ directory.

References