Add support for RUNS metric #220

jim-smith · 2023-08-09T14:46:47Z

Basic idea:
dataset D split into training set Tr and test set Te

concatenate train and test sets (so the top |Tr| records come from the training set and the bottom |Te| from the test set
pass that dataset into an attack model and get an array C (confidences) of size |D| holding attackers' confidence that each record is in the training set.
Add second column label to C with C[i][1] = 1 I<=|Tr| else 0
Sort C by first column (confidence)
Use the runs test to see whether the number of runs (sequences of the same value) within labels column of C is different to what you would expect if they were randomly distributed.

statsmodels has an implementation, but we might want to adapt it to the two-tail version: wikepedia

There's lots of good advice in here and simon has pointed out that we need to decide what to do in the case of ties when we sort by probability

albacrespi · 2023-08-09T15:28:22Z

For 1. to 4. there is code in GRAIMatter that we can re-use ( function "create_mia_data" in https://github.com/jim-smith/GRAIMatter/blob/main/attacks/scenarios.py) with perhaps some tuning.

jim-smith assigned jim-smith, albacrespi and shahzadmumtaz22 Aug 9, 2023

jim-smith added enhancement New feature or request sprint3(14/9/23) labels Aug 9, 2023

Provide feedback