Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galexie: sub-command to report data quality, statistics of remote data storage #5512

Open
chowbao opened this issue Nov 5, 2024 · 0 comments

Comments

@chowbao
Copy link
Contributor

chowbao commented Nov 5, 2024

Provide a sub-command such as ledgerexporter stats , which will inspect a remote data store and print to cli output a summary of the quality of data present on the store. Need to first identify the schema of the summary output , initial suggestions shown here, detecting ledger gaps is a key outcome that should be achieved:

=== Final Summary ===
Analysis took 300 seconds
DataStore: Type=GCS
DataStore Config: destination_bucket_path = "exporter-test/ledgers"

Total Ledgers: 34909
Total Partitions: 35
Oldest Ledger Sequence: 2
Newest Ledger Sequence: 50342
Schema:
  Ledgers Per File; 10
  Files Per Partition: 100

Ledger Gaps:
   400 - 550
   12320 - 12330

Invalid Partitions:
   300-399
   12300-12400

used lighthorizon's design for stats as a starting point for this feature.

Consider additional request parameter to narrow the remote data scanned by stats to report to only a specific partition prefix:

ledgerexporter stats --partition 300-399
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

1 participant