Question: Regarding GPU vs CPU usage and noise constraints #1131

lvlanson · 2024-11-06T09:23:17Z

Summary

I optimized an inputset for a circuit, such that the values inside the inputset are maximized with respect to the circuit. Compiling the circuit for the CPU works fine, but when compiling using GPU (Nvidia A100 80gb) the code execution breaks with
RuntimeError: Unfeasible noise constraint encountered

I invoke the compilation with

circuit = training_step.compile(inputset, composable=True, show_statistics=False, use_gpu=True)

Is there a reason why compilation on the GPU decreases the maximum bounds of integers?

Description

versions affected: 2.8.1
python version: 3.9

The text was updated successfully, but these errors were encountered:

BourgerieQuentin · 2024-11-06T10:29:39Z

Hello @Ivlanson,

The optimizer parameter seach space is different between CPU and GPU, as the implementation differs between both, the difference is the supported polynomial size.

On GPU the polynomial size is limited to 2^14 while is 2^18 on CPU
=> optimizer search spaces
=> GPU poly size limitation

We could support higher polynomial size in GPU backends but not sure it will leads to well performance as the bootstrap key size will also increase and fill to much the GPU memory.

I don't known if it is possible on your use-case but you could try to reduce the bitwidth of tlu, there are a bunch of techniques you can take a look of Optimize table lookups guide.

lvlanson · 2024-11-06T10:35:33Z

Is the memory hit on the gpu the only concern for limiting the polynomial size on the GPU? And assuming the polynomial size was equal, can the memory consumption be expected to be the same?

Concerning optimizing TLU, would this affect also the noise growth?

BourgerieQuentin · 2024-11-06T16:13:43Z

I don't think there are strong restrictions on polynomial size on the GPU, actually we haven't yet needs for asking supports of higher polynomial size. I just say that higher polynomial size leads to larger bootstraping key and may make troubles on GPU. I let @agnesLeroy give his insight on this.

For the second question I'm not sure about what you need, but I think is more the reverse the noise growth and noise budget affetc TLU optimization, i.e. more the bitwidth message is large less we have noise budget, more noise growth more parameter will be larger. But you can more or less assume (from what we see experimentaly) that for each additional bit you need an higher polynomial is needed (at least for bitwidth > 5).

agnesLeroy · 2024-11-06T16:27:29Z

Hey @lvlanson!

The maximum polynomial size supported on GPU is not as high as the one supported on CPU, that's probably why you get this error. This is related to our Cuda implementation of the FFT, that only supports polynomial sizes up to 16,384.

When increasing the polynomial size, the size of the bootstrapping key increases, and that can become an issue in case you try to execute on a small GPU, but you would get a Cuda out of memory error if you hit the memory limit of your GPU.

I hope this helps!

lvlanson added the bug Something isn't working label Nov 6, 2024

BourgerieQuentin added question and removed bug Something isn't working question labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Regarding GPU vs CPU usage and noise constraints #1131

Question: Regarding GPU vs CPU usage and noise constraints #1131

lvlanson commented Nov 6, 2024

BourgerieQuentin commented Nov 6, 2024 •

edited

Loading

lvlanson commented Nov 6, 2024

BourgerieQuentin commented Nov 6, 2024

agnesLeroy commented Nov 6, 2024

Question: Regarding GPU vs CPU usage and noise constraints #1131

Question: Regarding GPU vs CPU usage and noise constraints #1131

Comments

lvlanson commented Nov 6, 2024

Summary

Description

BourgerieQuentin commented Nov 6, 2024 • edited Loading

lvlanson commented Nov 6, 2024

BourgerieQuentin commented Nov 6, 2024

agnesLeroy commented Nov 6, 2024

BourgerieQuentin commented Nov 6, 2024 •

edited

Loading