-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: memory parquet reader #4962
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
…y ParquetReaderBuilder in tests • Remove unnecessary copy in decode_to_batches method of BulkPartEncoder. • Streamline ParquetReaderBuilder instantiation in tests by chaining calls and removing intermediate variable. • Update documentation comment from "file handle" to "file id" in FileRange. • Remove redundant comments and reformat code in ParquetReaderBuilder and Location enums, focusing on functionality changes like enabling inverted and fulltext index checks and exposing file path and region id methods.
Refactor Parquet reader and row group handling in mito2 - Make `Location` enum public and add async byte fetching method - Introduce `RowGroupLocation` to manage row group specific operations - Move byte fetching logic from `Sst` to `Location` and `RowGroupLocation` - Change `Location` fields and methods to `pub(crate)` for internal use - Update `RowGroupReaderBuilder` to use `file_location` instead of `location` - Add `fetch_bytes` and cache management methods to `RowGroupLocation` - Simplify `InMemoryRowGroup` creation with new `create` method - Adjust tests to reflect changes in Parquet reader and row group creation
…nd row group modules • Modified decode_to_batches in bulk/part.rs to use placeholders for unused parameters. • Enhanced logging in ParquetReader's Drop implementation to include region ID and row group metrics. • Added #[allow(dead_code)] to new_memory function in ParquetReaderBuilder. • Implemented region_id method in RowGroupReaderBuilder. • Added comments to RowGroupLocation methods for clarity. • Removed unnecessary comments and unused code.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4962 +/- ##
==========================================
- Coverage 84.06% 83.83% -0.23%
==========================================
Files 1142 1143 +1
Lines 211510 212114 +604
==========================================
+ Hits 177801 177826 +25
- Misses 33709 34288 +579 |
Due to the lacking of keyword generics and we cannot find a workaround currently, this proposal should be abandoned and we can only reused the reader part code be composition. |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Enables parquet reader to read from memory files, as pre work for parquet-based memtable.
Checklist