Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for RUNS metric #220

Open
jim-smith opened this issue Aug 9, 2023 · 1 comment
Open

Add support for RUNS metric #220

jim-smith opened this issue Aug 9, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request sprint3(14/9/23)

Comments

@jim-smith
Copy link
Contributor

Basic idea:
dataset D split into training set Tr and test set Te

  1. concatenate train and test sets (so the top |Tr| records come from the training set and the bottom |Te| from the test set
  2. pass that dataset into an attack model and get an array C (confidences) of size |D| holding attackers' confidence that each record is in the training set.
  3. Add second column label to C with C[i][1] = 1 I<=|Tr| else 0
  4. Sort C by first column (confidence)
  5. Use the runs test to see whether the number of runs (sequences of the same value) within labels column of C is different to what you would expect if they were randomly distributed.

statsmodels has an implementation, but we might want to adapt it to the two-tail version: wikepedia

There's lots of good advice in here and simon has pointed out that we need to decide what to do in the case of ties when we sort by probability

@albacrespi
Copy link
Contributor

For 1. to 4. there is code in GRAIMatter that we can re-use ( function "create_mia_data" in https://github.com/jim-smith/GRAIMatter/blob/main/attacks/scenarios.py) with perhaps some tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request sprint3(14/9/23)
Projects
None yet
Development

No branches or pull requests

3 participants