Optimize signature aggregation #1753

jagerman · 2024-11-07T03:12:38Z

Currently when an RPC node does a network fanout to get a reward or exit
signature, it verifies each signature as it returns before adding it to
the aggregate.

This ends up being quite slow: on my desktop machine on current
stagenet, in a release build, with 240 stagenet nodeus nodes, just these
verifications calls taking a bit more than a full second in total (while
the aggregation itself takes only a tenth of a second). While still
reasonable on stagenet, on mainnet it's going to be 10s of CPU time per
signature request, which is too much.

This commit rewrites it to speed it up considerably in the normal (i.e.
no bad signature) case:

the aggregator aggregates returning pubkeys and signatures without
individual verification.
once we have all results, we then perform a single verification of the
aggregated signature against the aggregated pubkey. If it succeeds
there's no other needed checks.
if that check fails then we fall back to doing a one-by-one
verification on all the individual signatures, removing them from the
aggregates if we find any failures.

In the normal case (where we don't get any failing signatures) this
speeds up aggregate processing by more than 10x by only needing one
signature verification.

A nice side effect of this is that because we always know the aggregate
pubkey now, we can include that in debug logs (previously it was only
available in debug logs in debug builds), and in the RPC result.

(This builds on top of #1751)

This adds a generic interface for making asychronous RPC request handlers, and applies these to the reward and exit signature endpoints. The RPC interface adds an alternative the `invoke(...)` method to optionally take a new, third argument of a `shared_ptr<response>`: when such a version of the invoke method is present then response is not send until the destruction of the shared_ptr, and so an asychronous request uses these, keeps the shared_ptr alive until the response is available, then lets it destruct which then triggers the response. This then pushes the async approach through the reward and exit signature requests, and gets everything working through asychronous callbacks rather than blocking requests. This also makes some small changes to signature handling: - Allow exit/liquidation RPC requests to be made by either SN pubkey (as currently) or BLS pubkey (new). - Allow liquidation of oxend-non-existent BLS pubkeys (i.e. liquidating a BLS pubkey that doesn't match any SNs oxen knows of); without this it isn't possible to remove a "bad" contract node that oxend didn't accept the registration of for whatever reason. - Add code to remove unwanted node extra signatures from produced signatures. Previously our concept of "non-signers" only included IDs to pass to the contract, but there is also a reverse failure were we collect a signature from a SN that isn't in the contract anymore (for example: a recently removed node with an incoming, but unconfirmed, exit event). This amends the signing code to detect any such signers and subtract the signatures of any such SNs from the aggregate before returning it. - Removes the walk-the-snode-linked-list contract handling code as it is not used anymore.

Fixes debug build.

There are (currently) two places invoking the callback, one of which wasn't catching exceptions; this genericizes the callback invocation to fix it, and adds a logging try/catch around the final_callback invocation as well.

Currently when an RPC node does a network fanout to get a reward or exit signature, it verifies each signature as it returns before adding it to the aggregate. This ends up being quite slow: on my desktop machine on current stagenet, in a release build, with 240 stagenet nodeus nodes, just these verifications calls taking a bit more than a full second in total (while the aggregation itself takes only a tenth of a second). While still reasonable on stagenet, on mainnet it's going to be 10s of CPU time per signature request, which is too much. This commit rewrites it to speed it up considerably in the normal (i.e. no bad signature) case: - the aggregator aggregates returning pubkeys and signatures without individual verification. - once we have all results, we then perform a single verification of the aggregated signature against the aggregated pubkey. If it succeeds there's no other needed checks. - if that check *fails* then we fall back to doing a one-by-one verification on all the individual signatures, removing them from the aggregates if we find any failures. In the normal case (where we don't get any failing signatures) this speeds up aggregate processing by more than 10x by only needing one signature verification. A nice side effect of this is that because we always know the aggregate pubkey now, we can include that in debug logs (previously it was only available in debug logs *in debug builds*), and in the RPC result.

Switches the aggregate signature caches to a shared_ptr to make the value thread-safe for storage and use across different threads when combined with the need to capture the data across async requests.

jagerman added 6 commits November 7, 2024 19:51

Make debug_redo_bls_aggregation_steps_locally asynchronous

c4911aa

Fixes debug build.

Fix nodes_request callback exception handling

e89fc3d

There are (currently) two places invoking the callback, one of which wasn't catching exceptions; this genericizes the callback invocation to fix it, and adds a logging try/catch around the final_callback invocation as well.

Fix caching/async interaction bug

b448b3f

Switches the aggregate signature caches to a shared_ptr to make the value thread-safe for storage and use across different threads when combined with the need to capture the data across async requests.

Fix missing fields on signing request endpoints

e9a10fb

jagerman force-pushed the optimize-signature-aggregation branch from 40004e3 to e9a10fb Compare November 7, 2024 23:52

jagerman mentioned this pull request Nov 15, 2024

Add configurable staking URL to register command #1759

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize signature aggregation #1753

Optimize signature aggregation #1753

jagerman commented Nov 7, 2024 •

edited

Loading

Optimize signature aggregation #1753

Are you sure you want to change the base?

Optimize signature aggregation #1753

Conversation

jagerman commented Nov 7, 2024 • edited Loading

jagerman commented Nov 7, 2024 •

edited

Loading