-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand procedure architecture for distributed execution, and support iceberg procedure rewrite_data_files
#22659
base: master
Are you sure you want to change the base?
Expand procedure architecture for distributed execution, and support iceberg procedure rewrite_data_files
#22659
Conversation
7ec819c
to
9440737
Compare
f89dc40
to
e796fa2
Compare
acb0351
to
c3eaa96
Compare
05de3c8
to
0dc3dbb
Compare
rewrite_data_files
rewrite_data_files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the draft doc! Some nits about punctuation, formatting, and some suggested rephrasing for readability and conciseness, but the content looks good.
0dc3dbb
to
a78c41c
Compare
@steveburnett Thanks a lot for your suggestion, all be fixed. Please take a look when convenient! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
a78c41c
to
2fdbab7
Compare
2fdbab7
to
befe9a7
Compare
efc388f
to
0dfa54c
Compare
4286ff5
to
ff0a4dc
Compare
rewrite_data_files
rewrite_data_files
ff0a4dc
to
84989af
Compare
84989af
to
f756641
Compare
…ailable in analyzer
f756641
to
01b4c9e
Compare
Description
This PR expand the current procedure architecture in presto, support defining, registering and calling procedures which need to be executed in a distributed way. Then support distributed procedure in Iceberg connector and implement a specific procedure
rewrite_data_files
for it.Referring to: prestodb/rfcs#12
The whole PR is separated into 6 parts:
Re-factor
ProcedureRegistry
/Procedure
data structure to support the creation and register ofDistributedProcedure
. And make sureProcedureRegistry
be available in presto-analyzer module, so that we can recognize distributed procedures in call statement during prepare and analyze stages.Handle call statement on distributed procedures in preparer stage. In this stage, we figure out the procedure's type in call statement, and define a new query type
CALL_DISTRIBUTED_PROCEDURE
forcall distributed procedure
inBuiltInPreparedQuery
. In this way,call distributed procedure
statement would be handled bySqlQueryExecutionFactory
, then be created and handled as aSqlQueryExecution
.Analyze and plan the
call distributed procedure
statement, and finally generate a logical plan for it as follows:Optimize, segmentation, grouped tag and local plan for the logical plan generated above. The handle logical for
CallDistributedProcedureNode
is similar asTableWriterNode
. Besides, a new optimizerRewriteWriterTarget
is added, which is placed after all optimization rules. It is used to update theTableHandle
held inTableFinishNode
andCallDistributedProcedureNode
based on the underlyingTableScanNode
after the entire optimization is completed, considering the possible filter pushing down.Re-factor Iceberg connector to support
call distributed procedure
. Introduce Iceberg's transaction context and expandIcebergSplitManager
to support split source planned byIcebergAbstractMetadata.beginCallDistributedProcedure(...)
. This split source will be set to transaction context, and use transaction context to hold all the files to be rewritten as well.Support Iceberg
rewrite_data_files
procedure. It build a customized split source, set the split source to transaction context in order to be used inIcebergSplitManager
. And register a file scan task consumer to collector and hold all the scanned files into transaction context. Then finally in the commit stage, get all the data files and delete files that has been rewritten, and all the files that has been newly generated, change and commit their metadata through Iceberg table'sRewriteFiles
transaction.Motivation and Context
N/A
Impact
N/A
Test Plan
N/A
Contributor checklist
Release Notes