Optimize BlockLocators Iterator with Parallel Processing #3430

meetrick · 2024-11-05T22:13:27Z

Motivation

This PR aims to optimize the BlockLocators into_iter method by introducing parallel processing. The existing sequential processing approach can be time-consuming when handling large datasets of block data. By leveraging rayon for parallel processing, this change seeks to improve performance in merging checkpoints and recents maps, providing a more efficient solution for large-scale data handling.

Test Plan

Generate a dataset of 1000 items to compare processing speeds.

Related PRs

(Link any related PRs here)

This PR enhances the BlockLocators iterator by introducing parallel processing with rayon. The checkpoints and recents maps are combined using into_par_iter, leveraging multi-threading to potentially improve performance. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

This patch introduces a performance test for the BlockLocators iterator, outputting the processing times.

meetrick · 2024-11-05T22:37:13Z

hey guys!

I expected that processing speed would increase with parallel processing. However, after testing, I observed that up to a certain number of items, parallel processing introduced overhead, which caused it to take longer. Based on my test code, sequential processing performed better up to approximately 100,000 items. The performance is also affected by the machine’s capabilities.

I would appreciate reviewers’ opinions on this. It seems that additional changes may be needed in the backend to fully accommodate these updates.

ljedrz · 2024-11-07T15:15:48Z

node/sync/locators/src/block_locators.rs

+            self.checkpoints
+            .into_par_iter()
+            .chain(self.recents.into_par_iter())
+            .collect::<Vec<_>>(),


can't this be collected directly into a BTreeMap via from_par_iter?

ljedrz · 2024-11-07T15:31:44Z

@meetrick in general, I'm not convinced that parallelizing the iteration itself is likely to be a performance improvement at any scale, unless we can utilize rayon to do the sorting that's involved when creating the BTreeMap; in other words, the operation that we should consider is BTreeMap::from_par_iter, as indicated in the removed comment.

meetrick · 2024-11-07T15:37:33Z

@ljedrz It seems I misunderstood the TODO. I’ll work on it again. Thanks for your input.

ljedrz · 2024-11-07T17:14:04Z

In addition, it is possible that sorting is not always necessary; I don't recall the related logic very well, but it seems to me that checkpoints and recents might already be in the right order at least in some scenarios. in such cases, avoiding the iteration and collecting to a BTreeMap would have a positive impact on performance.

meetrick added 3 commits November 5, 2024 13:45

Added Optimize BlockLocators Iterator Performance Test

85b7792

This patch introduces a performance test for the BlockLocators iterator, outputting the processing times.

Merge branch 'staging' into 20241105_node

2ad6650

meetrick marked this pull request as draft November 5, 2024 22:14

ljedrz reviewed Nov 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize BlockLocators Iterator with Parallel Processing #3430

Optimize BlockLocators Iterator with Parallel Processing #3430

meetrick commented Nov 5, 2024

meetrick commented Nov 5, 2024

ljedrz Nov 7, 2024 •

edited

Loading

ljedrz commented Nov 7, 2024

meetrick commented Nov 7, 2024

ljedrz commented Nov 7, 2024

Optimize BlockLocators Iterator with Parallel Processing #3430

Are you sure you want to change the base?

Optimize BlockLocators Iterator with Parallel Processing #3430

Conversation

meetrick commented Nov 5, 2024

Motivation

Test Plan

Related PRs

meetrick commented Nov 5, 2024

ljedrz Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

ljedrz commented Nov 7, 2024

meetrick commented Nov 7, 2024

ljedrz commented Nov 7, 2024

ljedrz Nov 7, 2024 •

edited

Loading