Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded folding and CP layers #129

Merged
merged 12 commits into from
Oct 29, 2023
Merged

Upgraded folding and CP layers #129

merged 12 commits into from
Oct 29, 2023

Conversation

loreloc
Copy link
Member

@loreloc loreloc commented Aug 11, 2023

New features

  1. Tensorized circuits constructed from region graphs with heterogeneous partition node arities can now be folded. For example, we are able to fold CP layers with product units having a different number of inputs.
  2. Masking the gradients of parameters has been tested with this unit test.
  3. All CP-derived layers have been generalized to support any arity.

Notes

  1. The re-parametrization must be aware if a layer is folded and if padding has been used to do so. To see this, imagine re-parametrizing a mixing layer with a softmax. If padding has been used to fold it, then only a subset of parameters must sum to one. To implement this, I assumed the re-parametrization function must take both a tensor and an optional fold mask (a boolean tensor masking the actual parameters).
  2. In order to implement the backtrack algorithm next (e.g., for sampling, mpe), I made the fold mask above an optional attribute of the layers. Indeed, we need this to mask the actual parameters of the sum units.
  3. I have removed the prod_exp implementation to simplify the current code for now. Before merging I will run some benchmarks to better understand how much faster the prod_exp implementation is. Do we already have some benchmarks on this? It seems to me we did not find a case where that tiny portion of parameters space is useful, especially if we allow for unnormalized parameters. Also, if we have CP layers with products with more than two inputs, it is always better to compute them in log-space for numerical stability.

Additional features we are now able to implement easily

  1. Region graphs where partition nodes are not necessarily binary. Implementing this requires reviewing the file format, as we assumed only "left" and right" regions.
  2. Implement methods fold and unfold in cirkit.models.functional, thus making folding optional. The unfold method would enable us to cache intermediate computations, e.g., for compression or to do fast inverse transform sampling.

Closes #113.
Closes #116.

@loreloc
Copy link
Member Author

loreloc commented Aug 25, 2023

Benchmark results for this PR.
It seems we are slightly faster now than July's version (#110), e.g., ~+6% on 28x28 PD architecture with Delta=5.

PD D7
Number of parameters: 103151628
Time (ms): 69.114+-0.589
Memory (MiB): 4711.899+-0.000
PD D5
Number of parameters: 105730068
Time (ms): 115.475+-1.899
Memory (MiB): 5051.841+-0.000
QT SD
Number of parameters: 103962625
Time (ms): 97.953+-2.027
Memory (MiB): 4726.975+-0.000
QT
Number of parameters: 105419780
Time (ms): 134.501+-1.737
Memory (MiB): 4768.519+-0.000
BT D4 R16
Number of parameters: 103120928
Time (ms): 65.111+-1.497
Memory (MiB): 4712.148+-0.000

@loreloc
Copy link
Member Author

loreloc commented Sep 6, 2023

This is not merged yet because I plan to implement more fine-grained tests for each layer, rather than just testing a PC.

@loreloc
Copy link
Member Author

loreloc commented Oct 29, 2023

  • Added a bunch of tests for layers.
  • Added tests for back propagation for both densely and sparsely (e.g., with heterogeneous layer arities) folded tensorized circuits.

@loreloc loreloc merged commit 61afab9 into main Oct 29, 2023
2 checks passed
@loreloc loreloc deleted the refactor-cp branch October 29, 2023 15:33
Copy link
Member

@lkct lkct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes for myself. some for discussion.

cirkit/layers/input/exp_family/categorical.py Show resolved Hide resolved
cirkit/layers/layer.py Show resolved Hide resolved
cirkit/layers/layer.py Show resolved Hide resolved
cirkit/layers/mixing.py Show resolved Hide resolved
cirkit/layers/mixing.py Show resolved Hide resolved
cirkit/layers/sum_product/cp.py Show resolved Hide resolved
cirkit/layers/sum_product/tucker.py Show resolved Hide resolved
cirkit/models/tensorized_circuit.py Show resolved Hide resolved
cirkit/layers/layer.py Show resolved Hide resolved
cirkit/layers/sum_product/cp.py Show resolved Hide resolved
inputs = F.pad(inputs, [0, 0, 0, 0, 0, 1], value=-float("inf"))
# It should be the neutral element of a group.
# For now computations are in log-space, thus 0 is our pad value.
inputs = F.pad(inputs, [0, 0, 0, 0, 0, 1], value=0) # pylint: disable=not-callable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pad can be fused into cat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge the two einsums in CP (similar for variants) Stopping gradients with fold masking
2 participants