Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: memory parquet reader #4962

Closed
wants to merge 6 commits into from

Commits on Nov 7, 2024

  1. wip: add memory row group

    v0y4g3r committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    1627cfe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bace13a View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2024

  1. add some tests

    v0y4g3r committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    e0187af View commit details
    Browse the repository at this point in the history
  2. feat/memory-decoder: Refactor decoding in BulkPartEncoder and simplif…

    …y ParquetReaderBuilder in tests
    
     • Remove unnecessary copy in decode_to_batches method of BulkPartEncoder.
     • Streamline ParquetReaderBuilder instantiation in tests by chaining calls and removing intermediate variable.
     • Update documentation comment from "file handle" to "file id" in FileRange.
     • Remove redundant comments and reformat code in ParquetReaderBuilder and Location enums, focusing on functionality changes like enabling inverted and fulltext index checks and
       exposing file path and region id methods.
    v0y4g3r committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    74c0f2d View commit details
    Browse the repository at this point in the history
  3. feat/memory-decoder:

     Refactor Parquet reader and row group handling in mito2
    
     - Make `Location` enum public and add async byte fetching method
     - Introduce `RowGroupLocation` to manage row group specific operations
     - Move byte fetching logic from `Sst` to `Location` and `RowGroupLocation`
     - Change `Location` fields and methods to `pub(crate)` for internal use
     - Update `RowGroupReaderBuilder` to use `file_location` instead of `location`
     - Add `fetch_bytes` and cache management methods to `RowGroupLocation`
     - Simplify `InMemoryRowGroup` creation with new `create` method
     - Adjust tests to reflect changes in Parquet reader and row group creation
    v0y4g3r committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    d7fb332 View commit details
    Browse the repository at this point in the history
  4. feat/memory-decoder: Updated decoding and logging in Parquet reader a…

    …nd row group modules
    
     • Modified decode_to_batches in bulk/part.rs to use placeholders for unused parameters.
     • Enhanced logging in ParquetReader's Drop implementation to include region ID and row group metrics.
     • Added #[allow(dead_code)] to new_memory function in ParquetReaderBuilder.
     • Implemented region_id method in RowGroupReaderBuilder.
     • Added comments to RowGroupLocation methods for clarity.
     • Removed unnecessary comments and unused code.
    v0y4g3r committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    6d26697 View commit details
    Browse the repository at this point in the history