Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Regarding GPU vs CPU usage and noise constraints #1131

Open
lvlanson opened this issue Nov 6, 2024 · 4 comments
Open

Question: Regarding GPU vs CPU usage and noise constraints #1131

lvlanson opened this issue Nov 6, 2024 · 4 comments
Labels

Comments

@lvlanson
Copy link

lvlanson commented Nov 6, 2024

Summary

I optimized an inputset for a circuit, such that the values inside the inputset are maximized with respect to the circuit. Compiling the circuit for the CPU works fine, but when compiling using GPU (Nvidia A100 80gb) the code execution breaks with
RuntimeError: Unfeasible noise constraint encountered

I invoke the compilation with

circuit = training_step.compile(inputset, composable=True, show_statistics=False, use_gpu=True)

Is there a reason why compilation on the GPU decreases the maximum bounds of integers?

Description

  • versions affected: 2.8.1
  • python version: 3.9
@lvlanson lvlanson added the bug Something isn't working label Nov 6, 2024
@BourgerieQuentin
Copy link
Member

BourgerieQuentin commented Nov 6, 2024

Hello @Ivlanson,

The optimizer parameter seach space is different between CPU and GPU, as the implementation differs between both, the difference is the supported polynomial size.

On GPU the polynomial size is limited to 2^14 while is 2^18 on CPU
=> optimizer search spaces
=> GPU poly size limitation

We could support higher polynomial size in GPU backends but not sure it will leads to well performance as the bootstrap key size will also increase and fill to much the GPU memory.

I don't known if it is possible on your use-case but you could try to reduce the bitwidth of tlu, there are a bunch of techniques you can take a look of Optimize table lookups guide.

@BourgerieQuentin BourgerieQuentin added question and removed bug Something isn't working question labels Nov 6, 2024
@lvlanson
Copy link
Author

lvlanson commented Nov 6, 2024

Is the memory hit on the gpu the only concern for limiting the polynomial size on the GPU? And assuming the polynomial size was equal, can the memory consumption be expected to be the same?

Concerning optimizing TLU, would this affect also the noise growth?

@BourgerieQuentin
Copy link
Member

I don't think there are strong restrictions on polynomial size on the GPU, actually we haven't yet needs for asking supports of higher polynomial size. I just say that higher polynomial size leads to larger bootstraping key and may make troubles on GPU. I let @agnesLeroy give his insight on this.

For the second question I'm not sure about what you need, but I think is more the reverse the noise growth and noise budget affetc TLU optimization, i.e. more the bitwidth message is large less we have noise budget, more noise growth more parameter will be larger. But you can more or less assume (from what we see experimentaly) that for each additional bit you need an higher polynomial is needed (at least for bitwidth > 5).

@agnesLeroy
Copy link
Contributor

Hey @lvlanson!

The maximum polynomial size supported on GPU is not as high as the one supported on CPU, that's probably why you get this error. This is related to our Cuda implementation of the FFT, that only supports polynomial sizes up to 16,384.

When increasing the polynomial size, the size of the bootstrapping key increases, and that can become an issue in case you try to execute on a small GPU, but you would get a Cuda out of memory error if you hit the memory limit of your GPU.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants