Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Documents and Views to better utilize Nebari #250

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Commits on May 9, 2022

  1. Compat with nebari's new TransactionId

    This also has one small change of moving when the documents are queried
    into the transaction. This should have no actual effect, since the
    integrity scanner must run for all views now before a document
    transaction is applied.
    ecton committed May 9, 2022
    Configuration menu
    Copy the full SHA
    c72ebe3 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2022

  1. Rewrote view and document storage

    Both khonsulabs#76 and khonsulabs#225 ended up being heavily intertwined. This is not
    yet in its final form, but it's complete enough that unit tests are
    passing (aside from backwards compatibility ones).
    
    Document Storage:
    
    Documents are no longer serialized in a wrapper document type. Instead,
    the documents tree is now a versioned tree with an embedded index that
    stores the document's hash. The Revision's id is now the versioned
    tree's sequence_id.
    
    This means that instead of simply pulling a document out of the database
    and deserializing it, we must pull the value and index out for a key and
    combine it with the key to create our document.
    
    The other major change is introduced by the constraints of working
    within Nebari's modification system. Because we don't have access to the
    index for a key we're about to set, most of the logic for creating the
    OperationResult has been moved outside of the CompareSwap operation.
    
    View Storage:
    
    Views have been refactored to store the reduced value in Nebari through
    use of an embedded index. Instead of storing the entire ViewEntry
    structure in the view, we now only store the serialized
    `Vec<Entrymapping>`. The major change here is that Nebari will now
    reduce the stored index via the new `ViewIndexer`. The changes haven't
    been made to reduce/reduce_grouped yet to use Nebari's native reduce
    function -- but that is the inspiration for these changes.
    
    When retrieving a view entry, we reconstruct the ViewEntry using the
    stored index to maintain compatibility with the existing code that
    worked with the ViewEntry structure.
    
    There are a lot of remaining tasks:
    
    - Update reduce/reduce_grouped() to use Nebari's built-in reduction.
    - Remove the invalidated_entries map and make the view mapper sequence
      based.
    - Embed the DocumentMap tree in the ViewEntries tree by creating a
      custom Root.
    - Once all the above are done, when the view indexer is running outside
      of a transaction (lazy views), the view can be persisted without fsync
      and be 100% safe to use due to the append-only file format.
    ecton committed May 11, 2022
    Configuration menu
    Copy the full SHA
    a239fce View commit details
    Browse the repository at this point in the history

Commits on May 12, 2022

  1. Refactored view mapping to be sequence based

    This commit removes the invalidated entries tree, and uses the sequence
    index of the documents tree to drive the indexing. The mapping operation
    is batched and performed in such a way that if new data is added to the
    documents tree while the operation is being performed, the indexing is
    performed using the sequence data at the time of the mapping job being
    kicked off.
    
    This guarantee allows us to track what the latest indexed sequence ID in
    the ViewEntries embedded index. The start of the map job begins from the
    ViewEntries tree's latest sequence id + 1.
    ecton committed May 12, 2022
    Configuration menu
    Copy the full SHA
    8d97953 View commit details
    Browse the repository at this point in the history
  2. Fixed reindexing removed documents

    This was a weird one to debug, as it only showed up on the
    simultaneous-connections test. Yet, the bug was unrelated to
    multiprocessing.
    
    Eager views are meant to always be up-to-date. This contract was broken
    when multiprocessing was involved, because there was a logic bug: the
    index being returned from TransactionTree::remove is the existing index,
    which means its sequence id is of the removed sequence, not the newly
    writeten sequence (document entries aren't actually removed, for history
    preservation).
    
    The fix is to retrieve the new sequence value and map it instead. This
    ensures we're actually mapping the deleted version of the entry.
    
    The reason this didn't cause issues outside of multithreading is most
    tests are written without specifying an access policy, which means all
    the queries are AccessPolicy::UpdateBefore. This meant that the
    preparation for queries would still index it, as it wasn't actually
    up-to-date.
    ecton committed May 12, 2022
    Configuration menu
    Copy the full SHA
    0c62f70 View commit details
    Browse the repository at this point in the history

Commits on May 16, 2022

  1. Embedded document map

    This change turns ViewEntries into a new Root implementor for Nebari
    that stores the view entries in one B+Tree, and stores the document map
    in another B+Tree.
    
    This pull request does not yet add the ability to query from the
    document map. Once that is implemented, I can remove the external
    document map tree which will conclude the final format changes.
    ecton committed May 16, 2022
    Configuration menu
    Copy the full SHA
    5632e68 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2022

  1. Removed the document map

    This removes the document_map tree, and stores it inline in a new custom
    Nebari Root. This custom tree supports querying what keys a document id
    emitted as well as what mappings were emitted for any given key.
    
    This branch also contains several other changes:
    
    The integrity scanner can spawn a mapping job, and that mapping job must
    use transactions if the view is eager. This set of changes addressed
    that, but it also lumped in with a refactor to change from
    easy_parallel to rayon.
    
    While rayon is a heaver dependency, I was noticing a *lot* of traffic on
    profiles for spinning up new threads. Rayon uses a persistent thread
    pool for work, and by embracing it here, we can start using it in other
    locations as well.
    ecton committed May 29, 2022
    Configuration menu
    Copy the full SHA
    2a910d4 View commit details
    Browse the repository at this point in the history

Commits on May 30, 2022

  1. Convert Checkout into a transaction

    This is meant to be an atomic operation, and is implemented in SQL as a
    single query.
    ecton committed May 30, 2022
    Configuration menu
    Copy the full SHA
    c7a2657 View commit details
    Browse the repository at this point in the history
  2. Fixed edge case in sequence mapping logic

    The collection sequence tracking I introduced as part of the
    sequence-based-mapping refactor was done incorrectly -- the sequence IDs
    can't be published to shared state until the transaction is confirmed.
    
    The edge case was that a lazy view could start mapping while a
    collection had a pending transaction being applied. The collection's
    sequence could report a higher number than the database would return via
    a query due to the transaction not being writen yet.
    
    This was partially a Nebari bug as well -- Tree::current_transaction_id
    was implemented incorrectly, while TransactionTree/TreeFile were
    correct.
    ecton committed May 30, 2022
    Configuration menu
    Copy the full SHA
    4a5407c View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2022

  1. Updated for Nebari on Sediment

    This isn't completely functional, but I was ready to merge changes in
    for clippy fixes from main. Still, only 2 tests are broken in
    bonsaidb-local currently that are expected to be working.
    ecton committed Jul 10, 2022
    Configuration menu
    Copy the full SHA
    bdf3d98 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fd71578 View commit details
    Browse the repository at this point in the history