Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rollup_relayer): retry CommitFailed batches #911

Closed
wants to merge 3 commits into from

Conversation

0xmountaintop
Copy link
Contributor

@0xmountaintop 0xmountaintop commented Sep 1, 2023

Purpose or design rationale of this PR

In today's incident, caused by #897, our sender keeps sending commitBatch txs and didn't realize they failed (reverted when being executed, e.g., https://sepolia.etherscan.io/tx/0x4846ed1477974716d9b8c57b6aee4e3fc710c8250499fd0631bf360844f0e5ea & https://sepolia.etherscan.io/tx/0xcb75cd2f2b086e40b7aaa03a31a63a239ab7776246ea9b1ae69946bd41c15e2b)

And the rollup_status in DB were updated to RollupCommitFailed. logs:

WARN [09-01|05:06:53.658|scroll-tech/bridge/internal/controller/relayer/l2_relayer.go:606] transaction confirmed but failed in layer1 confirmation="&{ID:0xf3f114d680407076965ef847f60c4489c02e047ff4d942b0cb0dffb9f82af7a9-commit IsSuccessful:false TxHash:0x4846ed1477974716d9b8c57b6aee4e3fc710c8250499fd0631bf360844f0e5ea}"
INFO [09-01|05:06:53.677|scroll-tech/bridge/internal/controller/relayer/l2_relayer.go:640] transaction confirmed in layer1          type=BatchesCommitment confirmation="&{ID:0xf3f114d680407076965ef847f60c4489c02e047ff4d942b0cb0dffb9f82af7a9-commit IsSuccessful:false TxHash:0x4846ed1477974716d9b8c57b6aee4e3fc710c8250499fd0631bf360844f0e5ea}"

However, our sender should not keep sending txs if the previous ones fail.

This PR adds the retry logic, and we will realize the commitment gets stuck earlier.

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

  • build: Changes that affect the build system or external dependencies (example scopes: yarn, eslint, typescript)
  • ci: Changes to our CI configuration files and scripts (example scopes: vercel, github, cypress)
  • docs: Documentation-only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that doesn't fix a bug, or add a feature, or improves performance
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • test: Adding missing tests or correcting existing tests

Deployment tag versioning

Has tag in common/version.go been updated or have you added bump-version tag to this PR?

  • No, this PR doesn't involve a new deployment, git tag, docker image tag
  • Yes

Breaking change label

Does this PR have the breaking-change label?

  • No, this PR is not a breaking change
  • Yes

@0xmountaintop 0xmountaintop added the bump-version Bump the version tag for deployment label Sep 1, 2023
@0xmountaintop 0xmountaintop changed the title [WIP] fix(rollup_relayer): retry CommitFailed batches fix(rollup_relayer): retry CommitFailed batches Sep 1, 2023
@0xmountaintop 0xmountaintop marked this pull request as ready for review September 1, 2023 11:13
@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Merging #911 (6f4e414) into develop (f9da81d) will not change coverage.
Report is 2 commits behind head on develop.
The diff coverage is 75.00%.

@@           Coverage Diff            @@
##           develop     #911   +/-   ##
========================================
  Coverage    43.74%   43.74%           
========================================
  Files           72       72           
  Lines         6991     6991           
========================================
  Hits          3058     3058           
  Misses        3682     3682           
  Partials       251      251           
Flag Coverage Δ
bridge 62.11% <75.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
bridge/internal/orm/batch.go 56.80% <66.66%> (ø)
bridge/internal/controller/relayer/l2_relayer.go 44.00% <100.00%> (ø)

@@ -332,7 +332,7 @@ func (r *Layer2Relayer) ProcessGasPriceOracle() {
// ProcessPendingBatches processes the pending batches by sending commitBatch transactions to layer 1.
func (r *Layer2Relayer) ProcessPendingBatches() {
// get pending batches from database in ascending order by their index.
pendingBatches, err := r.batchOrm.GetPendingBatches(r.ctx, 5)
pendingBatches, err := r.batchOrm.GetPendingAndCommitFailedBatches(r.ctx, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not keep retrying forever. If it's a gas limit issue, then it should succeed the 2nd time. If it's a bug (e.g. Golang batch hash does not match Solidity batch hash), then retrying again and again will just burn ETH, in this case we need human intervention to fix the bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@0xmountaintop 0xmountaintop deleted the retry_commitFailedBatch branch September 1, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump-version Bump the version tag for deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants