Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galexie: co-locate schema settings on datastore #5507

Open
chowbao opened this issue Nov 5, 2024 · 0 comments
Open

Galexie: co-locate schema settings on datastore #5507

chowbao opened this issue Nov 5, 2024 · 0 comments

Comments

@chowbao
Copy link
Contributor

chowbao commented Nov 5, 2024

Problem: No mechanism to ensure the distributed processes of CDP such as ledgerexporter and backend processes use the same consistent schema settings when accessing a remote datastore. Currently, each process must maintain it's own copy of the same schema settings for ledgers-per-file and files-per-partition

Acceptance Criteria: Make the source-of-truth(SoT) for datastore schema settings be co-located on the datastore, ledgerexporter and backend processes defer to datastore for schema settings at runtime instead of using local config values.

  • Add DataStore.InitSchema(SchemaConfig) and DataStore.GetSchema() SchemaConfig interface methods. Implement the methods on all existing DataStore's.

    • GCSDataStore implements as a new object schema.json, need to determine full object key value for bucket, so it doesn't get included in ledger traversal operations.
  • Initialization of schema settings on datastore, decide and implement from options:

    • option1 - New sub-command to initialize the schema settings on datastore - ledgerexporter datastore initschema --ledgers-per-file --files-per-partition --config-file config.toml.

      • depends on ledgerexporter's existing config.toml to provide datastore connection settings.

      • shows failure status if schema settings already exists on datastore

      • shows success output if able to publish schema settings to datastore via DataStore.InitSchema(SchemaConfig)

    • option2 - ledgerexporter automatically detects if schema settings are absent on datastore and pushes the settings via DataStore.InitSchema(SchemaConfig).

      • uses the ledgers-per-file and files-per-partition settings in the local config.toml

      • if the schema settings on datastore are present, ledger exporter should validate that they are consistent with each other before proceeding. otherwise, it should raise an error.

  • Ledgerexporter startup routine is changed to read schema.json from datastore via DataStore.GetFile

    • remove --ledgers-per-file and --files-per-partition cli flags as those values should be sourced dynamically from DataStore.GetSchema().

    • If datastore schema is absent, fail fast, print console error stating datastore is not initialized yet, and instructions to use ledgerexporter datastore initschema

  • Change all backend implementations to use DataStore.GetSchema() and remove any aspects that were obtaining ledgers-per-file or files-per-partition from local config settings. Backends should emit fatal errors at the sdk level to inform callers that datastore schema is not initialized yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Do
Development

No branches or pull requests

3 participants