-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Documents and Views to better utilize Nebari #250
base: main
Are you sure you want to change the base?
Commits on May 9, 2022
-
Compat with nebari's new TransactionId
This also has one small change of moving when the documents are queried into the transaction. This should have no actual effect, since the integrity scanner must run for all views now before a document transaction is applied.
Configuration menu - View commit details
-
Copy full SHA for c72ebe3 - Browse repository at this point
Copy the full SHA c72ebe3View commit details
Commits on May 11, 2022
-
Rewrote view and document storage
Both khonsulabs#76 and khonsulabs#225 ended up being heavily intertwined. This is not yet in its final form, but it's complete enough that unit tests are passing (aside from backwards compatibility ones). Document Storage: Documents are no longer serialized in a wrapper document type. Instead, the documents tree is now a versioned tree with an embedded index that stores the document's hash. The Revision's id is now the versioned tree's sequence_id. This means that instead of simply pulling a document out of the database and deserializing it, we must pull the value and index out for a key and combine it with the key to create our document. The other major change is introduced by the constraints of working within Nebari's modification system. Because we don't have access to the index for a key we're about to set, most of the logic for creating the OperationResult has been moved outside of the CompareSwap operation. View Storage: Views have been refactored to store the reduced value in Nebari through use of an embedded index. Instead of storing the entire ViewEntry structure in the view, we now only store the serialized `Vec<Entrymapping>`. The major change here is that Nebari will now reduce the stored index via the new `ViewIndexer`. The changes haven't been made to reduce/reduce_grouped yet to use Nebari's native reduce function -- but that is the inspiration for these changes. When retrieving a view entry, we reconstruct the ViewEntry using the stored index to maintain compatibility with the existing code that worked with the ViewEntry structure. There are a lot of remaining tasks: - Update reduce/reduce_grouped() to use Nebari's built-in reduction. - Remove the invalidated_entries map and make the view mapper sequence based. - Embed the DocumentMap tree in the ViewEntries tree by creating a custom Root. - Once all the above are done, when the view indexer is running outside of a transaction (lazy views), the view can be persisted without fsync and be 100% safe to use due to the append-only file format.
Configuration menu - View commit details
-
Copy full SHA for a239fce - Browse repository at this point
Copy the full SHA a239fceView commit details
Commits on May 12, 2022
-
Refactored view mapping to be sequence based
This commit removes the invalidated entries tree, and uses the sequence index of the documents tree to drive the indexing. The mapping operation is batched and performed in such a way that if new data is added to the documents tree while the operation is being performed, the indexing is performed using the sequence data at the time of the mapping job being kicked off. This guarantee allows us to track what the latest indexed sequence ID in the ViewEntries embedded index. The start of the map job begins from the ViewEntries tree's latest sequence id + 1.
Configuration menu - View commit details
-
Copy full SHA for 8d97953 - Browse repository at this point
Copy the full SHA 8d97953View commit details -
Fixed reindexing removed documents
This was a weird one to debug, as it only showed up on the simultaneous-connections test. Yet, the bug was unrelated to multiprocessing. Eager views are meant to always be up-to-date. This contract was broken when multiprocessing was involved, because there was a logic bug: the index being returned from TransactionTree::remove is the existing index, which means its sequence id is of the removed sequence, not the newly writeten sequence (document entries aren't actually removed, for history preservation). The fix is to retrieve the new sequence value and map it instead. This ensures we're actually mapping the deleted version of the entry. The reason this didn't cause issues outside of multithreading is most tests are written without specifying an access policy, which means all the queries are AccessPolicy::UpdateBefore. This meant that the preparation for queries would still index it, as it wasn't actually up-to-date.
Configuration menu - View commit details
-
Copy full SHA for 0c62f70 - Browse repository at this point
Copy the full SHA 0c62f70View commit details
Commits on May 16, 2022
-
This change turns ViewEntries into a new Root implementor for Nebari that stores the view entries in one B+Tree, and stores the document map in another B+Tree. This pull request does not yet add the ability to query from the document map. Once that is implemented, I can remove the external document map tree which will conclude the final format changes.
Configuration menu - View commit details
-
Copy full SHA for 5632e68 - Browse repository at this point
Copy the full SHA 5632e68View commit details
Commits on May 29, 2022
-
This removes the document_map tree, and stores it inline in a new custom Nebari Root. This custom tree supports querying what keys a document id emitted as well as what mappings were emitted for any given key. This branch also contains several other changes: The integrity scanner can spawn a mapping job, and that mapping job must use transactions if the view is eager. This set of changes addressed that, but it also lumped in with a refactor to change from easy_parallel to rayon. While rayon is a heaver dependency, I was noticing a *lot* of traffic on profiles for spinning up new threads. Rayon uses a persistent thread pool for work, and by embracing it here, we can start using it in other locations as well.
Configuration menu - View commit details
-
Copy full SHA for 2a910d4 - Browse repository at this point
Copy the full SHA 2a910d4View commit details
Commits on May 30, 2022
-
Convert Checkout into a transaction
This is meant to be an atomic operation, and is implemented in SQL as a single query.
Configuration menu - View commit details
-
Copy full SHA for c7a2657 - Browse repository at this point
Copy the full SHA c7a2657View commit details -
Fixed edge case in sequence mapping logic
The collection sequence tracking I introduced as part of the sequence-based-mapping refactor was done incorrectly -- the sequence IDs can't be published to shared state until the transaction is confirmed. The edge case was that a lazy view could start mapping while a collection had a pending transaction being applied. The collection's sequence could report a higher number than the database would return via a query due to the transaction not being writen yet. This was partially a Nebari bug as well -- Tree::current_transaction_id was implemented incorrectly, while TransactionTree/TreeFile were correct.
Configuration menu - View commit details
-
Copy full SHA for 4a5407c - Browse repository at this point
Copy the full SHA 4a5407cView commit details
Commits on Jul 10, 2022
-
Updated for Nebari on Sediment
This isn't completely functional, but I was ready to merge changes in for clippy fixes from main. Still, only 2 tests are broken in bonsaidb-local currently that are expected to be working.
Configuration menu - View commit details
-
Copy full SHA for bdf3d98 - Browse repository at this point
Copy the full SHA bdf3d98View commit details -
Configuration menu - View commit details
-
Copy full SHA for fd71578 - Browse repository at this point
Copy the full SHA fd71578View commit details