-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asyncify the Tree Building #975
base: master
Are you sure you want to change the base?
Conversation
Image from this branch passes upload/download dist-tests. This means that the cause of failing downloads cannot be explained by long-blocking operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks fine, with the caveats that:
- it's hackish, and IMHO we should revert it once we can run those easily on separate threads, which would be the "proper" solution;
- I am not so sure about the new/init API. It doesn't seem to comply with the expected semantics;
- we're doing some behavior-changing transformations by catching the exceptions and returning results.
Would be wise to get another pair of eyes on this as we've discussed earlier today, I'm nevertheless approving it.
let | ||
mhash = ? mcodec.mhash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the heck did this = ?
even do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 OK this raises an exception, so this change is actually modifying behavior. I suppose this is OK if it doesn't break expectations at callers, but we could simply preserve this behavior and propagate the exception using a Chronos raises clause. I see you've done this change in several places, and I'm assuming you did the due diligence to ensure this doesn't cause surprises, so I won't block your PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could simply preserve this behavior and propagate the exception using a Chronos raises clause
That would actually change the behaviour. = ?
will either assign a value to mhash
or it will return a Result
with an error to the closure. By raising an exception, and specifying that exception type in {.async: (raises: [...]).}
, we're not returning a Result, but raising an exception.
Imo, the proposed change here does not alter the behaviour.
@@ -305,6 +304,29 @@ proc buildManifest*[T, H](self: SlotsBuilder[T, H]): Future[?!Manifest] {.async. | |||
self.cellSize, | |||
self.strategy.strategyType) | |||
|
|||
proc init*[T, H](self: SlotsBuilder[T, H]): Future[?!void] {.async.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Hm... I think the fact that you need to call new
and then init
is not expected. According to the Status style guide and to some of our own code (e.g. here), init
is supposed to return a constructed object (and not void), so it'd actually replace new
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that an init
call can be confusing when not used for construction. In general, Codex follows the convention of using init
for constructing object types and new
for constructing ref
types, as per @dryajov's comment: https://discord.com/channels/895609329053474826/895622391349272637/1086322836001534072
Maybe an easy fix is to call this initialise
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of this seems like it was replacing let x = ? someResultFunc()...
with without x =? someResultFunc()
, which doesn't seem necessary to me. In other words, it feels like we should make the switch to async
without changing any other semantics if possible. It's easy to get into trouble modifying lines "while we're here" as there can be unintended consequences, especially in nim and especially with this many changes in a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have changed them if I didn't have to. = ?
doesn't compile in async procs.
@@ -91,6 +91,8 @@ asyncchecksuite "Test Node - Host contracts": | |||
|
|||
protected = (await erasure.encode(manifest, 3, 2)).tryGet() | |||
builder = Poseidon2Builder.new(localStore, protected).tryGet() | |||
(await builder.init()).tryGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... this looks quite weird (doing a get on a void) but OK, I don't have better ideas. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's odd... but it's a way to force an exception in case there is no value.
Another way (not sure if it's "better"):
if (await builder.init()).isErr: fail()
But this doesn't give much output if it does fail, so an even more explicit way is:
if error =? (await builder.init()).errorOption:
echo error.msg
fail()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(await ...).tryGet() calls are all over the testing code. It fails the test when the result is an error. How would you like to see this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've hit some significant issues using waitFor
in the past, and we need to avoid it like the plague (cc @gmega). I outlined a pattern that might help avoid the usage.
@@ -145,16 +145,18 @@ func compress*( | |||
mhash.coder(@x & @y & @[ key.byte ], digest) | |||
success digest | |||
|
|||
func init*( | |||
proc init*( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to avoid making this init
async so we can avoid using waitFor
later on... instead this is a good pattern to follow:
https://github.com/codex-storage/nim-ethers/blob/80b2ead97ce32cca74cef4a4501ab97106cb578b/ethers/providers/jsonrpc.nim#L50-L86
Then later, we can use CodexTree.init
synchronously, and when needed, we can let layers = await tree.layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern. If I applied it, SlotBuilder verifyTree and verifyRoot funcs would have to become async procs to await for the internal field. This would propagate the asycing through the codebase further. Is this desirable? Or is there a way around this?
without mhash =? leaves[0].mhash.mapFailure, error: | ||
return failure error | ||
|
||
var hashes = newSeq[seq[byte]]() | ||
for leaf in leaves: | ||
without hash =? leaf.mhash.mapFailure, error: | ||
return failure error | ||
hashes.add(hash.digestBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this really need a rewrite? In other words, was there a compilation issue that forced this re-write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out a cleaner way to unroll the results. If you've got one, I'm eager to learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove my requested changes as I realise the call sites using waitFor
are in the tests only. However, that is not to say that this pattern of using async init
won't bite us later on. I just don't want to stand in the way of progress.
My overall comment for these changes is that because we are making so many changes here and it feels like we're trying to rush this in, we really should limit the scope of changes by changing as little as possible. If i understood correctly, the main idea of this PR was the make some things asynchronous. However, we've also replaced let x = ? someResultFunc()...
with without x =? someResultFunc()
in quite a few places. This is one of those things that seems innocuous, but can also bite us later on (especially in nim).
I am also not a big fan of "let's do this for now and revert later" kind of changes, mostly because there may be patches affecting this code in the meantime which are hard to merge, and also sometimes reversion are forgotten or partially done.
Ok, so I didn't approve these changes. I merely wanted to remove my changes requested 🤦 |
All right. Let's put a break on this one. Here's why: The change I made is to make the tree-building async by adding a small async-sleep in the loop so the process can be interrupted for, for example, the handling of network traffic. That's it. All other changes I made are required downstream changes to make stuff compile and work again. Yes, there are a lot of those. There's not much we can do about that unless we are considering restructuring a lot more code, which is the opposite of making small changes. The new |
Investigation into node stability shows that tree-building is blocking the main execution loop for long periods of time in some use-cases. This PR breaks up the execution time by asyncifying the tree-building call chain.