Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks and small fixes to the prototype #310

Merged
merged 10 commits into from
Aug 20, 2024
Merged

Conversation

mheinzel
Copy link
Collaborator

@mheinzel mheinzel commented Jul 25, 2024

Probably easiest to review commit by commit.

I managed to find a few invariants that too tight, or looser than necessary. However, this required adding handwritten tests, as the lockstep test coverage is not great.

The fix for incorrect assumptions violated by 5-way merges (holding back an underfull run) are simply addressed by providing more merge credits, but there are alternatives we can think about afterwards (e.g. #311)

@mheinzel mheinzel force-pushed the mheinzel/prototype-tweaks branch 2 times, most recently from 962767a to 199466d Compare July 25, 2024 12:51
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising, I've left some comments but haven't looked into all changes

Comment on lines 281 to 282
-- level could need. This overestimation means that merges will only
-- complete at the last possible moment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that depends: the merge can finish earlier, but it should never finish too late. Maybe reword this a little bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a real implementation that's true. But in the prototype merges will never finish early, as their completion is not based on the any "real" work to be performed (which would depend on the number of entries), but on the debt calculated here, which is the maximum a merge at this level could require.

I can add that explanation to the comment, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning for always requiring the maximum amount of work is that it makes it more likely to find issues related to supplying too few credits somewhere. We could base everything on cost (sum of input run sizes), but not sure whether it's an improvement or not.

Comment on lines +93 to +111
-- get something to 3rd level (so 2nd level is not levelling)
-- (needs 5 runs to go to level 2 so the resulting run becomes too big)
traverse_ ins [101..100+(5*16)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's record assumptions like this about the structure in an assertion. i.e., check that there is a run on the third level. This should ensure that the test can't become outdated if we change the prototype. And let's do that in other places too where it makes sense, not just this line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, you could also make this test size factor independent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to make it size factor independent for now, but I think I found a decent way of making assertions about the LSM shape. It's a bit annoying that merges always get created a little later than one (at least I) would think, e.g. flushing 4 write buffers does not yet result in a merge, but that's just how the algorithm currently works (and also the reason for losing some potential sharing).

prototypes/ScheduledMergesTestQLS.hs Outdated Show resolved Hide resolved
prototypes/ScheduledMergesTestQLS.hs Outdated Show resolved Hide resolved
prototypes/ScheduledMergesTestQLS.hs Outdated Show resolved Hide resolved
Comment on lines 212 to 217
-- | Without this, the prototype only completes tiering merges when the next
-- merging run on this level is created, so a level would never contain a
-- completed merge.
ASupply :: ModelVar Model (LSM RealWorld)
-> Int
-> Action (Lockstep Model) ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this is necessary. Shouldn't the prototype be able to maintain it's invariants without supplying credits explicitly? If we add this action, we're not letting the prototype "do its thing"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the whole part of the invariant that checks a CompletedMerge present in a tiering level is dead code. Runs stay Ongoing until they are replaced by the next incoming merge.

Maybe that's something I could change in the prototype, related to #310 (comment).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without ASupply, I can do

--- a/prototypes/ScheduledMerges.hs
+++ b/prototypes/ScheduledMerges.hs
@@ -247,6 +247,7 @@ invariant = go 1
             -- Note that tiering on the last level only occurs when there is
             -- a single level only.
             (_, CompletedMerge r, MergeLastLevel) -> do
+              assertST $ False
               assertST $ ln == 1
               assertST $ tieringRunSizeToLevel r <= ln+1
 
@@ -254,7 +255,8 @@ invariant = go 1
             -- level it is entering, but can also be one smaller (in which case
             -- it'll be held back and merged again) or one larger (because it
             -- includes a run that has been held back before).
-            (_, CompletedMerge r, MergeMidLevel) ->
+            (_, CompletedMerge r, MergeMidLevel) -> do
+              assertST $ False
               assertST $ tieringRunSizeToLevel r `elem` [ln-1, ln, ln+1]

And even increase the generator size a lot and don't get any failures:

prototype
  ScheduledMerges
    ScheduledMerges vs model: OK (88.31s)
      +++ OK, passed 100 tests.
      
      Action polarity (83563 in total):
      100.000% +
      
      Actions (83563 in total):
      33.441% +AInsert
      32.943% +ALookup
      16.808% +ADelete
       8.383% +ADump
       8.308% +ADuplicate
       0.118% +ANew

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although arguably this just putting on a bandaid. The lockstep tests just don't reach proper coverage. For the first tiering merge to even be created, ~100 elements need to be inserted, which means a lot of actions are required before anything interesting happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove ASupply and instead make sure the invariant gets checked inbetween supplying credits and doing the other work (e.g. requiring a merge to be completed), so it has a chance to look at every merge once it completes. The need for better generators remains, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could add an action that performs multiple inserts (instead of just 1). Would that hit the corner case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, but it didn't catch any of the incorrect invariants I found with the handwritten tests. My implementation was not ideal, though, it can probably be improved. I only return one of the inserted keys (so it gets added to the available variables), which means key re-use is less likely and it's harder to get a lot of compaction. Returning multiple keys from a single action turned out to be tricky, so I abandoned that.

prototypes/ScheduledMerges.hs Outdated Show resolved Hide resolved
@jorisdral
Copy link
Collaborator

Posting this here so I don't forget: I was reading the Monkey paper and found this quote:

If multiple runs that are being sort-merged contain entries with the same key, only the entry
from the most recently-created (youngest) run is kept because it is the most up-to-date. Thus,
the resulting run may be smaller than the cumulative sizes of the original runs. When a merge
operation is finished, the resulting run moves to Level i + 1 if Level i is at capacity.

Specially the last part we do differently: we look only at the size of the run to determine whether it should stay, and we don't look at the overall level capacity. Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

@mheinzel
Copy link
Collaborator Author

mheinzel commented Aug 9, 2024

Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

Ah interesting. Not sure I fully understand the context (maybe it's just talking about levelling, where this is pretty much what we do already?), but should be possible for tiering as well. However, it weakens the invariants on run sizes quite a bit.

Copy link
Collaborator

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't horrible, but does show some things that are a bit horrible (the possibility of ending up with too-big runs as well as too-small).

I also endorse the comments from Joris.

Comment on lines +255 to +280
-- 'callStack' just ensures that the 'HasCallStack' constraint is not redundant
-- when compiling with debug assertions disabled.
assertST :: HasCallStack => Bool -> ST s ()
assertST p = assert p $ return (const () callStack)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really warn? Surely no, since the assertion is eliminated after type checking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prototypes/ScheduledMerges.hs Outdated Show resolved Hide resolved
prototypes/ScheduledMerges.hs Outdated Show resolved Hide resolved
@jorisdral
Copy link
Collaborator

jorisdral commented Aug 12, 2024

Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

Ah interesting. Not sure I fully understand the context (maybe it's just talking about levelling, where this is pretty much what we do already?), but should be possible for tiering as well. However, it weakens the invariants on run sizes quite a bit.

Indeed, it seems we already do this for levelling, but for tiering we only look at the run size without looking at the overall level capacity. In theory this can lead to holding back a run even when the level would then be at capacity (the level being the previous one, since the incoming run still counts as being on the previous level technically)

-- If r is still too small for this level then keep it and merge again
-- with the incoming runs.
MergePolicyTiering | tieringRunSizeToLevel r < ln -> do

Maybe taking into account level capacity would remove the need for supplying >1 credits in creditsForMerge, but I haven't thought it through

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go once you've resolve the comments I left before. Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

@mheinzel
Copy link
Collaborator Author

Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

I can open an issue and quickly outline various ideas for changing when to start a merge. Thinking about that again, "if Level i is at capacity" might just mean "if there are 4 runs", it doesn't say whether capacity refers to number of entries or number of runs. But I should have another look at the paper at some point.

@mheinzel
Copy link
Collaborator Author

I adressed the feedback and rebased. What do you think?

@jorisdral
Copy link
Collaborator

Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

I can open an issue and quickly outline various ideas for changing when to start a merge.

Yeah, let's open an issue

Thinking about that again, "if Level i is at capacity" might just mean "if there are 4 runs", it doesn't say whether capacity refers to number of entries or number of runs. But I should have another look at the paper at some point.

No, capacity talks about the entries in the run, not the number of runs

@jorisdral jorisdral added this pull request to the merge queue Aug 20, 2024
Merged via the queue into main with commit a9e4271 Aug 20, 2024
18 checks passed
@jorisdral jorisdral deleted the mheinzel/prototype-tweaks branch August 20, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants