Update syncing logic to fix duplicate block requests #3410
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
As issue #3404 described, validators send out duplicate block requests when syncing. This PR updates the syncing logic to make sure we only remove a block response after it was added to the ledger or if we encountered an error.
For background, currently, syncing works as follows:
BlockSync
requests the blocks, and stores it in the internalresponses
map.Sync
callsblock_sync.process_next_block(current_height)
to get the next block to process. This removes it fromBlockSync
’sresponses
and returns it toSync
.Sync
performs the checks, and if successful, adds the block to theledger
. This also updates BlockSync’scanon
.The issue is that
BlockSync
‘s view of thecanon
is only updated afterSync
is done processing blocks (and adding to the ledger). There’s a window whereBlockSync
is unaware of blocks that are pending validation inSync
. Thus, it re-requests them.This PR adjusts the block processing interface to the
Sync
module as follows:process_next_block
READS a block from a particular heightremove_block_response
REMOVES a block from a particular height, which MUST be called after advancing completed or failedTest Plan
Ran it locally and verified it removes the duplicate block request.
The following figure shows a syncing process before the fix, with a longer pause until a duplicate request is generated:
The following figure shows the syncing process with the fix (note there is no double request and no delay, as such, the syncing is much faster):
Closes #3404