Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched marginalisation mask #318

Merged
merged 9 commits into from
Nov 25, 2024

Conversation

andreasgrv
Copy link
Collaborator

@andreasgrv andreasgrv commented Nov 11, 2024

This PR closes #292. We now allow for different marginalisation masks to be applied for each entry in a batch.
At the same time, we also allow a single scope to be broadcasted across all entries in a batch, so this change should be backward compatible.

More concretely, the input to IntegrateQuery can now be:

  1. a torch tensor of shape (B, D) where B is the batch size and D is the number of variables in the scope of the circuit. The tensor's dtype should be torch.bool it should have True in the positions of random variables that should be marginalised out and False elsewhere.
    inputs = # tensor with batch_size=2
    mar_query = IntegrateQuery(circuit)
    # Integrate out 1, 3 from first example and 0 from second example
    mask = torch.tensor([[False, True, False, True], [True, False, False, False]], dtype=torch.bool)
    mar_scores = mar_query(inputs, integrate_vars=mask)
  1. a list of scopes:
    inputs = # tensor with batch_size=2
    mar_query = IntegrateQuery(circuit)
    mar_scores = mar_query(inputs, integrate_vars=[Scope(1,3), Scope(0)])
  1. a single scope, in which case the integration mask is broadcasted across the batch:
    inputs = # tensor with batch_size=some integer
    mar_query = IntegrateQuery(circuit)
    mar_scores = mar_query(inputs, integrate_vars=Scope(1,3))

Due to 1., each entry in the batch can have a scope over a different number of variables - and this can be an issue when using pytorch, since pytorch tensors have fixed size for each dimension. The solution at the moment is to use a boolean mask of size (batch_size, num_variables), where num_variables is an upper bound on the number of variables in the scope of the circuit (see below).

Assumptions

We assume the size of the scope is <= max(scope), i.e. the maximum int in the scope. We need this since the actual number of variables may change - i.e. some ids may be dropped and len(scope) may be invalid, as highlighted by @loreloc.

Future work

Deal with sparsity

We currently expand the list of scopes into a dense boolean tensor mask. If there is a very large number of variables and the integration mask is sparse, it would make sense to replace the dense implementation with a sparse one, e.g. see Sparse Coo Tensor.

@loreloc loreloc self-requested a review November 11, 2024 14:33
Copy link
Member

@loreloc loreloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks a lot!
cc @mlnpapez

@loreloc loreloc merged commit 9efd59a into april-tools:main Nov 25, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Marginalization mask
2 participants