Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beacond restart loop #2107

Open
aditya-manit opened this issue Oct 28, 2024 · 5 comments
Open

Beacond restart loop #2107

aditya-manit opened this issue Oct 28, 2024 · 5 comments
Assignees

Comments

@aditya-manit
Copy link

Our validator node started throwing this in loop

time.Sleep(0x77359400)
        runtime/time.go:285 +0xf2
github.com/cometbft/cometbft/internal/consensus.(*Reactor).queryMaj23Routine(0xc00571e240, {0x294e6b0, 0xc0055fc9c0}, 0xc0055fca90)
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:799 +0x5b
created by github.com/cometbft/cometbft/internal/consensus.(*Reactor).AddPeer in goroutine 2514495328
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:214 +0x1bb
goroutine 2749043931 [select]:
github.com/cometbft/cometbft/p2p.(*peer).metricsReporter(0xc014c94dd0)
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/p2p/peer.go:364 +0x12c
created by github.com/cometbft/cometbft/p2p.(*peer).OnStart in goroutine 2749043915
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/p2p/peer.go:198 +0x66
goroutine 6422858742 [select]:
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).startTicker.func1()
        github.com/cockroachdb/pebble@v1.1.1/vfs/disk_health.go:171 +0xc5
created by github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).startTicker in goroutine 396263
        github.com/cockroachdb/pebble@v1.1.1/vfs/disk_health.go:166 +0x58
goroutine 2128295989 [select]:
github.com/cometbft/cometbft/p2p.(*peer).metricsReporter(0xc012213a00)
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/p2p/peer.go:364 +0x12c
created by github.com/cometbft/cometbft/p2p.(*peer).OnStart in goroutine 2128293927
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/p2p/peer.go:198 +0x66
goroutine 142791532 [sleep]:
time.Sleep(0x5f5e100)
        runtime/time.go:285 +0xf2
github.com/cometbft/cometbft/internal/consensus.(*Reactor).gossipDataRoutine(0xc00571e240, {0x294e6b0, 0xc010fd4f70}, 0xc010fd5040)
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:642 +0x24e
created by github.com/cometbft/cometbft/internal/consensus.(*Reactor).AddPeer in goroutine 6801
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:212 +0xe7
goroutine 1452414648 [sleep]:
time.Sleep(0x77359400)
        runtime/time.go:285 +0xf2
github.com/cometbft/cometbft/internal/consensus.(*Reactor).queryMaj23Routine(0xc00571e240, {0x294e6b0, 0xc00b242a90}, 0xc00b242b60)
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:799 +0x5b
created by github.com/cometbft/cometbft/internal/consensus.(*Reactor).AddPeer in goroutine 6801
        github.com/cometbft/cometbft@v1.0.0-rc1.0.20240806094948-2c4293ef36c4/internal/consensus/reactor.go:214 +0x1bb
goroutine 2128295799 [sleep]:
time.Sleep(0x5f5e100)

Beacond commit: dd024c5
Reth commit: 1ba631ba9581973e7c6cadeea92cfe1802aceb4a

Version: v0.2.0-alpha.8
Version: 1.1.0

@abi87
Copy link
Collaborator

abi87 commented Oct 28, 2024

Hello @aditya-manit, thanks for filing the issue.
Any chance we can get the full log? I would like to check what caused the issue in the first place!
Thanks

@abi87 abi87 self-assigned this Oct 28, 2024
@aditya-manit
Copy link
Author

Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #033[90m2024-10-27T04:20:10+01:00 #033[32mINFO#033[0m Finalizing commit of block module=consensus#033[0m height=6595637#033[0m hash=C3F9FBD5EA8E4C1F39D9CA79B603D42FF0DF55802D43716DEB61DA97B1802240#033[0m root=19F82FB75EF526E72C1B333A212170F79073D64E6876DD6C9A4E8F5591EAC7F1#033[0m num_txs=2#033[0m
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #033[90m2024-10-27T04:20:10+01:00 #033[32mINFO#033[0m Inserted new payload into execution chain service=execution-engine#033[0m payload_block_hash=0x537eb7b342455946863c152e7d70cbdfbdb6ba6a9f67302500119c079129e572#033[0m payload_parent_block_hash=0x390033bb3b320963581746560cf6496f88d7fcb5528b48a272fff2ede6382219#033[0m is_optimistic=true#033[0m
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: fatal error: concurrent map iteration and map write
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: goroutine 4420 [running]:
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.validateHeaders(0x52e3ad?)
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/transport.go:514 +0x4a
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.(*Transport).roundTrip(0x3d08380, 0xc0001f8dc0)
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/transport.go:547 +0x16e
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.(*Transport).RoundTrip(0x3128010?, 0x28fc680?)
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/roundtrip.go:30 +0x13
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.send(0xc0001f8dc0, {0x28fc680, 0x3d08380}, {0xc005963601?, 0x41824b?, 0x0?})
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/client.go:259 +0x5e4
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.(*Client).send(0x3e5c780, 0xc0001f8dc0, {0x0?, 0xc0059636c8?, 0x0?})
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/client.go:180 +0x98
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.(*Client).do(0x3e5c780, 0xc0001f8dc0)
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/client.go:725 +0x8bc
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: net/http.(*Client).Do(...)
Oct 27 04:20:10 berachain-testnetv2 beacond[1178]: #011net/http/client.go:590

Here is the log file too with logs related to this issue
issue.log

@aditya-manit
Copy link
Author

We were running beacond using systemd service, had to manually stop and start the process to get it resolved

@gummybera
Copy link
Contributor

{ Could it be related to #2057 ? }

@sbond14
Copy link

sbond14 commented Oct 29, 2024

I had the same issue running with docker and a container orchestrator. It resolved itself when the container failed with "fatal error:concurrent map read and map write" and restarted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants