Upgraded folding and CP layers #129

loreloc · 2023-08-11T11:44:36Z

New features

Tensorized circuits constructed from region graphs with heterogeneous partition node arities can now be folded. For example, we are able to fold CP layers with product units having a different number of inputs.
Masking the gradients of parameters has been tested with this unit test.
All CP-derived layers have been generalized to support any arity.

Notes

The re-parametrization must be aware if a layer is folded and if padding has been used to do so. To see this, imagine re-parametrizing a mixing layer with a softmax. If padding has been used to fold it, then only a subset of parameters must sum to one. To implement this, I assumed the re-parametrization function must take both a tensor and an optional fold mask (a boolean tensor masking the actual parameters).
In order to implement the backtrack algorithm next (e.g., for sampling, mpe), I made the fold mask above an optional attribute of the layers. Indeed, we need this to mask the actual parameters of the sum units.
I have removed the prod_exp implementation to simplify the current code for now. Before merging I will run some benchmarks to better understand how much faster the prod_exp implementation is. Do we already have some benchmarks on this? It seems to me we did not find a case where that tiny portion of parameters space is useful, especially if we allow for unnormalized parameters. Also, if we have CP layers with products with more than two inputs, it is always better to compute them in log-space for numerical stability.

Additional features we are now able to implement easily

Region graphs where partition nodes are not necessarily binary. Implementing this requires reviewing the file format, as we assumed only "left" and right" regions.
Implement methods fold and unfold in cirkit.models.functional, thus making folding optional. The unfold method would enable us to cache intermediate computations, e.g., for compression or to do fast inverse transform sampling.

Closes #113.
Closes #116.

loreloc · 2023-08-25T08:20:58Z

Benchmark results for this PR.
It seems we are slightly faster now than July's version (#110), e.g., ~+6% on 28x28 PD architecture with Delta=5.

PD D7
Number of parameters: 103151628
Time (ms): 69.114+-0.589
Memory (MiB): 4711.899+-0.000

PD D5
Number of parameters: 105730068
Time (ms): 115.475+-1.899
Memory (MiB): 5051.841+-0.000

QT SD
Number of parameters: 103962625
Time (ms): 97.953+-2.027
Memory (MiB): 4726.975+-0.000

QT
Number of parameters: 105419780
Time (ms): 134.501+-1.737
Memory (MiB): 4768.519+-0.000

BT D4 R16
Number of parameters: 103120928
Time (ms): 65.111+-1.497
Memory (MiB): 4712.148+-0.000

loreloc · 2023-09-06T18:03:22Z

This is not merged yet because I plan to implement more fine-grained tests for each layer, rather than just testing a PC.

loreloc · 2023-10-29T15:33:11Z

Added a bunch of tests for layers.
Added tests for back propagation for both densely and sparsely (e.g., with heterogeneous layer arities) folded tensorized circuits.

lkct

some notes for myself. some for discussion.

cirkit/layers/input/exp_family/categorical.py

cirkit/layers/layer.py

cirkit/layers/mixing.py

cirkit/layers/sum_product/cp.py

cirkit/layers/sum_product/tucker.py

cirkit/models/tensorized_circuit.py

cirkit/layers/layer.py

cirkit/layers/sum_product/cp.py

lkct · 2023-10-30T13:39:47Z

cirkit/models/tensorized_circuit.py

-                inputs = F.pad(inputs, [0, 0, 0, 0, 0, 1], value=-float("inf"))
+                #  It should be the neutral element of a group.
+                #  For now computations are in log-space, thus 0 is our pad value.
+                inputs = F.pad(inputs, [0, 0, 0, 0, 0, 1], value=0)  # pylint: disable=not-callable


pad can be fused into cat?

cirkit/layers/mixing.py

loreloc added 7 commits August 11, 2023 07:21

added fold_mask

4f58224

fix mypy errors

d1c934a

folded cp input branches

117aad4

fix folding of shared cp

7ca785a

removed useless NaN to nums

20c88da

fix tests

bcd98e4

removed prod_exp from cirkit benchmark

a885a6c

removed redundancies in CP layers implementations

55841a7

loreloc force-pushed the refactor-cp branch from 1a1e2ba to 55841a7 Compare August 28, 2023 09:25

loreloc added 4 commits October 29, 2023 13:39

Merge branch 'main' into refactor-cp

37b7b01

solved ci errors

5540fac

added per-layer tests

0b734d7

fix ci errors in tests

dd73207

loreloc merged commit 61afab9 into main Oct 29, 2023
2 checks passed

loreloc deleted the refactor-cp branch October 29, 2023 15:33

lkct reviewed Oct 30, 2023

View reviewed changes

cirkit/layers/mixing.py Show resolved Hide resolved

lkct mentioned this pull request Oct 30, 2023

Separate the fold_mask from reparam and make reparam a class #136

Closed

lkct added the enhancement New feature or request label Oct 30, 2023

lkct assigned loreloc Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgraded folding and CP layers #129

Upgraded folding and CP layers #129

loreloc commented Aug 11, 2023 •

edited

Loading

loreloc commented Aug 25, 2023

loreloc commented Sep 6, 2023

loreloc commented Oct 29, 2023

lkct left a comment

lkct Oct 30, 2023

Upgraded folding and CP layers #129

Upgraded folding and CP layers #129

Conversation

loreloc commented Aug 11, 2023 • edited Loading

loreloc commented Aug 25, 2023

loreloc commented Sep 6, 2023

loreloc commented Oct 29, 2023

lkct left a comment

Choose a reason for hiding this comment

lkct Oct 30, 2023

Choose a reason for hiding this comment

loreloc commented Aug 11, 2023 •

edited

Loading