Tweaks and small fixes to the prototype #310

mheinzel · 2024-07-25T10:53:07Z

Probably easiest to review commit by commit.

I managed to find a few invariants that too tight, or looser than necessary. However, this required adding handwritten tests, as the lockstep test coverage is not great.

The fix for incorrect assumptions violated by 5-way merges (holding back an underfull run) are simply addressed by providing more merge credits, but there are alternatives we can think about afterwards (e.g. #311)

prototypes/ScheduledMerges.hs

jorisdral

Looks promising, I've left some comments but haven't looked into all changes

jorisdral · 2024-07-30T08:14:12Z

prototypes/ScheduledMerges.hs

+    -- level could need. This overestimation means that merges will only
+    -- complete at the last possible moment.


Well, that depends: the merge can finish earlier, but it should never finish too late. Maybe reword this a little bit?

In a real implementation that's true. But in the prototype merges will never finish early, as their completion is not based on the any "real" work to be performed (which would depend on the number of entries), but on the debt calculated here, which is the maximum a merge at this level could require.

I can add that explanation to the comment, though.

I think the reasoning for always requiring the maximum amount of work is that it makes it more likely to find issues related to supplying too few credits somewhere. We could base everything on cost (sum of input run sizes), but not sure whether it's an improvement or not.

jorisdral · 2024-07-30T08:26:36Z

prototypes/ScheduledMergesTestQLS.hs

+        -- get something to 3rd level (so 2nd level is not levelling)
+        -- (needs 5 runs to go to level 2 so the resulting run becomes too big)
+        traverse_ ins [101..100+(5*16)]


Let's record assumptions like this about the structure in an assertion. i.e., check that there is a run on the third level. This should ensure that the test can't become outdated if we change the prototype. And let's do that in other places too where it makes sense, not just this line

Optionally, you could also make this test size factor independent

I decided not to make it size factor independent for now, but I think I found a decent way of making assertions about the LSM shape. It's a bit annoying that merges always get created a little later than one (at least I) would think, e.g. flushing 4 write buffers does not yet result in a merge, but that's just how the algorithm currently works (and also the reason for losing some potential sharing).

prototypes/ScheduledMergesTestQLS.hs

jorisdral · 2024-07-30T08:38:32Z

prototypes/ScheduledMergesTestQLS.hs

+    -- | Without this, the prototype only completes tiering merges when the next
+    -- merging run on this level is created, so a level would never contain a
+    -- completed merge.
+    ASupply :: ModelVar Model (LSM RealWorld)
+            -> Int
+            -> Action (Lockstep Model) ()


It's not clear to me why this is necessary. Shouldn't the prototype be able to maintain it's invariants without supplying credits explicitly? If we add this action, we're not letting the prototype "do its thing"

Without this, the whole part of the invariant that checks a CompletedMerge present in a tiering level is dead code. Runs stay Ongoing until they are replaced by the next incoming merge.

Maybe that's something I could change in the prototype, related to #310 (comment).

Without ASupply, I can do

--- a/prototypes/ScheduledMerges.hs +++ b/prototypes/ScheduledMerges.hs @@ -247,6 +247,7 @@ invariant = go 1 -- Note that tiering on the last level only occurs when there is -- a single level only. (_, CompletedMerge r, MergeLastLevel) -> do + assertST $ False assertST $ ln == 1 assertST $ tieringRunSizeToLevel r <= ln+1 @@ -254,7 +255,8 @@ invariant = go 1 -- level it is entering, but can also be one smaller (in which case -- it'll be held back and merged again) or one larger (because it -- includes a run that has been held back before). - (_, CompletedMerge r, MergeMidLevel) -> + (_, CompletedMerge r, MergeMidLevel) -> do + assertST $ False assertST $ tieringRunSizeToLevel r `elem` [ln-1, ln, ln+1]

And even increase the generator size a lot and don't get any failures:

prototype ScheduledMerges ScheduledMerges vs model: OK (88.31s) +++ OK, passed 100 tests. Action polarity (83563 in total): 100.000% + Actions (83563 in total): 33.441% +AInsert 32.943% +ALookup 16.808% +ADelete 8.383% +ADump 8.308% +ADuplicate 0.118% +ANew

Although arguably this just putting on a bandaid. The lockstep tests just don't reach proper coverage. For the first tiering merge to even be created, ~100 elements need to be inserted, which means a lot of actions are required before anything interesting happens.

I'll remove ASupply and instead make sure the invariant gets checked inbetween supplying credits and doing the other work (e.g. requiring a merge to be completed), so it has a chance to look at every merge once it completes. The need for better generators remains, though.

One could add an action that performs multiple inserts (instead of just 1). Would that hit the corner case?

I tried it, but it didn't catch any of the incorrect invariants I found with the handwritten tests. My implementation was not ideal, though, it can probably be improved. I only return one of the inserted keys (so it gets added to the available variables), which means key re-use is less likely and it's harder to get a lot of compaction. Returning multiple keys from a single action turned out to be tricky, so I abandoned that.

prototypes/ScheduledMerges.hs

jorisdral · 2024-08-05T18:28:44Z

Posting this here so I don't forget: I was reading the Monkey paper and found this quote:

If multiple runs that are being sort-merged contain entries with the same key, only the entry
from the most recently-created (youngest) run is kept because it is the most up-to-date. Thus,
the resulting run may be smaller than the cumulative sizes of the original runs. When a merge
operation is finished, the resulting run moves to Level i + 1 if Level i is at capacity.

Specially the last part we do differently: we look only at the size of the run to determine whether it should stay, and we don't look at the overall level capacity. Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

mheinzel · 2024-08-09T14:30:43Z

Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

Ah interesting. Not sure I fully understand the context (maybe it's just talking about levelling, where this is pretty much what we do already?), but should be possible for tiering as well. However, it weakens the invariants on run sizes quite a bit.

dcoutts

Yes, this isn't horrible, but does show some things that are a bit horrible (the possibility of ending up with too-big runs as well as too-small).

I also endorse the comments from Joris.

dcoutts · 2024-08-09T14:51:50Z

prototypes/ScheduledMerges.hs

+-- 'callStack' just ensures that the 'HasCallStack' constraint is not redundant
+-- when compiling with debug assertions disabled.
+assertST :: HasCallStack => Bool -> ST s ()
+assertST p = assert p $ return (const () callStack)


Does it really warn? Surely no, since the assertion is eliminated after type checking.

With assertST p = assert p $ return () I got this: https://github.com/IntersectMBO/lsm-tree/actions/runs/10092810012/job/27907230850#step:13:201

prototypes/ScheduledMerges.hs

jorisdral · 2024-08-12T14:06:11Z

Would it be a reasonable alternative to promote runs based on the level capacity, not just their own size?

Ah interesting. Not sure I fully understand the context (maybe it's just talking about levelling, where this is pretty much what we do already?), but should be possible for tiering as well. However, it weakens the invariants on run sizes quite a bit.

Indeed, it seems we already do this for levelling, but for tiering we only look at the run size without looking at the overall level capacity. In theory this can lead to holding back a run even when the level would then be at capacity (the level being the previous one, since the incoming run still counts as being on the previous level technically)

lsm-tree/prototypes/ScheduledMerges.hs

Lines 515 to 517 in 639bfbf

    
           -- If r is still too small for this level then keep it and merge again 
        
           -- with the incoming runs. 
        
           MergePolicyTiering | tieringRunSizeToLevel r < ln -> do

Maybe taking into account level capacity would remove the need for supplying >1 credits in creditsForMerge, but I haven't thought it through

jorisdral

Should be good to go once you've resolve the comments I left before. Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

This way we always know the exact assertion that failed.

The adjustment of the calculation is necessary to avoid assertion failures for the new assertion.

This is necessary when holding back runs that are too small.

mheinzel · 2024-08-20T09:51:51Z

Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

I can open an issue and quickly outline various ideas for changing when to start a merge. Thinking about that again, "if Level i is at capacity" might just mean "if there are 4 runs", it doesn't say whether capacity refers to number of entries or number of runs. But I should have another look at the paper at some point.

mheinzel · 2024-08-20T09:53:27Z

I adressed the feedback and rebased. What do you think?

jorisdral · 2024-08-20T13:11:54Z

Maybe, if you can fit in #310 (comment), that would be nice, otherwise could you create an issue for it that links to my comment?

I can open an issue and quickly outline various ideas for changing when to start a merge.

Yeah, let's open an issue

Thinking about that again, "if Level i is at capacity" might just mean "if there are 4 runs", it doesn't say whether capacity refers to number of entries or number of runs. But I should have another look at the paper at some point.

No, capacity talks about the entries in the run, not the number of runs

mheinzel requested review from dcoutts and jorisdral as code owners July 25, 2024 10:53

mheinzel force-pushed the mheinzel/prototype-tweaks branch 2 times, most recently from 962767a to 199466d Compare July 25, 2024 12:51

mheinzel commented Jul 29, 2024

View reviewed changes

prototypes/ScheduledMerges.hs Show resolved Hide resolved

mheinzel force-pushed the mheinzel/prototype-tweaks branch from 199466d to 68e44f7 Compare July 29, 2024 10:55

jorisdral reviewed Jul 30, 2024

View reviewed changes

dcoutts approved these changes Aug 9, 2024

View reviewed changes

jorisdral requested changes Aug 12, 2024

View reviewed changes

mheinzel added 10 commits August 20, 2024 10:00

refactor prototype

a414034

unify style of invariant

ab10233

This way we always know the exact assertion that failed.

add test for merging underfull run again with incoming ones

6fa93c4

increase size for prototype lockstep test

22f88b4

fix run size invariants that were too tight

c76b0a9

tighten invariants

8071d62

fix calculation of merge debt

ef61d4f

The adjustment of the calculation is necessary to avoid assertion failures for the new assertion.

prototype: supply 1.25 credits for 5-way tiering merges

6f15601

This is necessary when holding back runs that are too small.

check invariant after supplying credits

77c4bb4

make assertions on LSM shape in prototype tests

44a8cd4

mheinzel force-pushed the mheinzel/prototype-tweaks branch from 68e44f7 to 44a8cd4 Compare August 20, 2024 08:07

mheinzel requested a review from recursion-ninja as a code owner August 20, 2024 08:07

jorisdral approved these changes Aug 20, 2024

View reviewed changes

jorisdral added this pull request to the merge queue Aug 20, 2024

Merged via the queue into main with commit a9e4271 Aug 20, 2024
18 checks passed

jorisdral deleted the mheinzel/prototype-tweaks branch August 20, 2024 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweaks and small fixes to the prototype #310

Tweaks and small fixes to the prototype #310

mheinzel commented Jul 25, 2024 •

edited

Loading

jorisdral left a comment

jorisdral Jul 30, 2024

mheinzel Jul 30, 2024

mheinzel Jul 30, 2024

jorisdral Jul 30, 2024

jorisdral Jul 30, 2024

mheinzel Aug 20, 2024

jorisdral Jul 30, 2024

mheinzel Jul 30, 2024

mheinzel Jul 30, 2024

mheinzel Jul 30, 2024

mheinzel Jul 30, 2024

jorisdral Aug 5, 2024

mheinzel Aug 20, 2024

jorisdral commented Aug 5, 2024

mheinzel commented Aug 9, 2024 •

edited

Loading

dcoutts left a comment

dcoutts Aug 9, 2024

mheinzel Aug 9, 2024

jorisdral commented Aug 12, 2024 •

edited

Loading

jorisdral left a comment

mheinzel commented Aug 20, 2024

mheinzel commented Aug 20, 2024

jorisdral commented Aug 20, 2024

		-- level could need. This overestimation means that merges will only
		-- complete at the last possible moment.

Tweaks and small fixes to the prototype #310

Tweaks and small fixes to the prototype #310

Conversation

mheinzel commented Jul 25, 2024 • edited Loading

jorisdral left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisdral commented Aug 5, 2024

mheinzel commented Aug 9, 2024 • edited Loading

dcoutts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisdral commented Aug 12, 2024 • edited Loading

jorisdral left a comment

Choose a reason for hiding this comment

mheinzel commented Aug 20, 2024

mheinzel commented Aug 20, 2024

jorisdral commented Aug 20, 2024

mheinzel commented Jul 25, 2024 •

edited

Loading

mheinzel commented Aug 9, 2024 •

edited

Loading

jorisdral commented Aug 12, 2024 •

edited

Loading