-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent block processing #931
base: master
Are you sure you want to change the base?
Conversation
src/p2p.rs
Outdated
} | ||
Ok(()) | ||
let x: Vec<_> = result.into_par_iter().map(|(a, b)| func(a, b)).collect(); // TODO restore observe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should instead start processing as soon as the data arrives, instead of starting when all the data arrives
73eae57
to
b2fcb27
Compare
This comment still applies but could be addressed in a subsequent PR. |
if self.height < other.height { | ||
self.tip_row = other.tip_row | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic could bring to intermediate invalid states (with middle heights still missing).
However, since all batches are always merged should be okay at the end
b2fcb27
to
93e2161
Compare
Is something needed on my end? |
self.spending_rows.extend(other.spending_rows.into_iter()); | ||
self.txid_rows.extend(other.txid_rows.into_iter()); | ||
if self.height < other.height { | ||
self.tip_row = other.tip_row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also update self.height
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so -> #931 (comment)
Which is not ideal at all, but I am not sure if there are other better options, or if I should keep this and add a comment
IIUC, it seems to be related to this PR... could you please take a look? |
src/p2p.rs
Outdated
let _ = send.send(r); | ||
}); | ||
} | ||
let result: Result<Vec<_>, std::sync::mpsc::RecvError> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to block here (from rayon
point of view)?
Asking, since when using this patch, it seems that a task is being spawned, but seems to not run:
[2023-11-20T19:44:54.785Z DEBUG electrs::p2p] peer inventory: [Block(0x423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e)]
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] got 1 new headers
[2023-11-20T19:44:54.786Z INFO electrs::index] indexing 1 blocks: [111..111]
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] loading 1 blocks
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] spawning task for 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] waiting for 1 blocks
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] processing 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.786Z DEBUG electrs::index] writing 3 funding and 11 spending rows from 2 transactions, 1 blocks
[2023-11-20T19:44:54.843Z INFO electrs::signals] notified via SIG10
[2023-11-20T19:44:54.866Z INFO electrs::chain] chain updated: tip=423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e, height=111
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] loading 1 blocks
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] spawning task for 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] waiting for 1 blocks
<it seems to get stuck here>
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":283,"params":[25]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":284,"params":[10]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":285,"params":[5]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":286,"params":[2]}
[2023-11-20T19:45:48.889Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"mempool.get_fee_histogram","id":287}
Not sure, but it may be related to rayon-rs/rayon#835... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a good catch! Did you have a chance to try the patch with rayon::in_place_scope
instead of rayon::scope
as suggested in the issue? I'll try tomorrow anyway :)
Another alternative maybe could be to use Arc<Mutex<Vec<R>>>
instead of the channel (so the blocking happens inside the spawn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I can't emulate the deadlock locally (I can see the output of another debug print after recv), maybe it depends on the number of threads used by rayon.
I am going to push a temporary commit to use the CI for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the CI is not happy with in_place_scope
and neither with the Mutex version https://github.com/RCasatta/electrs/actions/runs/6941492681/job/18882464601
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't emulate the deadlock locally (I can see the output of another debug print after recv)
With locally I mean without the docker...
How can I see the logs with the docker command as it is done (docker run -v $PWD/contrib/:/contrib -v $PWD/tests/:/tests --rm electrs:tests bash /tests/run.sh
) in the CI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I see the logs with the docker command as it is done (
docker run -v $PWD/contrib/:/contrib -v $PWD/tests/:/tests --rm electrs:tests bash /tests/run.sh
) in the CI ?
electrs logs are stored under /data/electrs
(inside the container):
Line 47 in a01d58c
2> data/electrs/regtest-debug.log & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the dumb questions, but if the container dies after an error how to check inside the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the logs are indeed unavailable when the container fails in CI...
I have added two more commits (8326249 & acaebd0), so it should be possible to reproduce the issue locally by:
$ docker build -f Dockerfile.ci . --rm -t electrs:tests-bug
$ docker run -v $PWD/contrib/:/contrib -v /tmp/data:/data -v $PWD/tests/:/tests --rm electrs:tests-bug bash -x /tests/run.sh
The logs should be under /tmp/data/
after the container exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, in the meantime I launched a couple of jobs on the CI.
The bad news is that the in_place_scope
seems to deadlock the same https://github.com/RCasatta/electrs/actions/runs/6947160675
The good news is that the Mutex version isn't deadlocking https://github.com/RCasatta/electrs/actions/runs/6946996655
But before cleaning up the Mutex version I think it could be possible a mutex-free solution...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But before cleaning up the Mutex version I think it could be possible a mutex-free solution...
I was considering to use a Vec<OnceLock<R>>
for thread results, however OnceLock
has MSRV 1.70
So, I changed the Mutex<Vec<R>>
to Vec<Mutex<Option<R>>>
so that there are no blocking call excluding the end of the rayon scope
requires changing the signature of the function from FnMut -> Fn and return data instead of relying on mutating the context
7e693de
to
f7c8cca
Compare
Thanks @RCasatta! Will review at the weekend :) |
})?; | ||
self.blocks_duration | ||
.observe_duration("process", || func(hash, block)); | ||
let mut result = Vec::with_capacity(blockhashes_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, can we use 5aba2a1 to simplify this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! It is much better I didn't think about that
18a31e3
to
f7c8cca
Compare
})?; | ||
for (b, h) in inputs_filtering { | ||
let e = result.entry(b).or_default(); | ||
e.extend(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may happen that the same transaction both funds and spends the same scripthash - so we probably need to "merge" both funding and spending TxEntry
instances -> f43de8f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry to introduce so many bugs in the proposed PR, but it's pretty hard to remember all these cases, ideally some testing should enforce these...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I definitely should add more tests :)
By changing the signature of the
for_blocks
function, avoiding theFnMut
and returning data instead, we can process blocks in parallel.This requires changing the code using the function because at the moment closures are using mutability in the context.
The status of the PR is very early, there are bugs sometimes with TODO, reviewers should avoid in-depth review at the moment, just want to demonstrate feasibility and have early feedback on the concept.
On my machine, with a node with an empty mempool the time taken by
./history.py 1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF --no-merkle-proofs
is reduced from about 3.5s to 2.5s