-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add getSchema() to SourceClient interface #294
base: main
Are you sure you want to change the base?
Conversation
@@ -48,6 +49,14 @@ public interface SourceClient<COMMIT> extends Closeable { | |||
*/ | |||
SchemaCatalog getSchemaCatalog(OneTable table, COMMIT commit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can eliminate the SchemaCatalog
class with this change?
It was one of the objectives of #77
@@ -48,6 +49,14 @@ public interface SourceClient<COMMIT> extends Closeable { | |||
*/ | |||
SchemaCatalog getSchemaCatalog(OneTable table, COMMIT commit); | |||
|
|||
/** | |||
* Extracts the {@link OneSchema} as of the latest state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method requires a OneTable
object as an argument, which corresponds to the table format at a certain point in time. The method should return the schema that matches the OneTable
object, not the most recent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-other-tim-brown, could you please confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, for the point-in-time schema, we would need the commit/snapshot param. I will make the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coded9 the changes required for this are quite straightforward. You can copy the implementation for the getSchemaCatalog
that already handles this but simply return the schema and not the schema catalog object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wonder if this getSchema method is even required when the schema itself is already present in the OneTable object.
/** | ||
* Extracts the {@link OneSchema} as of the latest state. | ||
* | ||
* @param table the current state of the table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment might require an update, depending on Tim's response.
@@ -82,6 +83,11 @@ public SchemaCatalog getSchemaCatalog(OneTable table, HoodieInstant commit) { | |||
return HudiSchemaCatalogExtractor.catalogWithTableSchema(table); | |||
} | |||
|
|||
@Override | |||
public OneSchema getSchema(OneTable table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add unit tests for Hudi and Delta sources also.
Important Read
Addresses #77
What is the purpose of the pull request
This change adds getSchema() method to SourceClient interface
Brief change log
Verify this pull request
Run getSchema test in TestIcebergSourceClient
cc: @ashvina