-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050
Conversation
306a427
to
877056b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./python/pyspark/tests/test_memory_profiler.py:548:23: E741 ambiguous variable name 'l'
def summarize(l, r):
^
./python/pyspark/sql/tests/test_udf_profiler.py:501:23: E741 ambiguous variable name 'l'
def summarize(l, r):
^
df1 = self.spark.createDataFrame([(1, 1.0), (2, 2.0), (1, 3.0), (2, 4.0)], ("id", "v1")) | ||
df2 = self.spark.createDataFrame([(1, "x"), (2, "y")], ("id", "v2")) | ||
|
||
def summarize(l, r): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we rename to left
and right
according to the lint error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the reason and saw "In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead." That's good to learn :p
df1 = self.spark.createDataFrame([(1, 1.0), (2, 2.0), (1, 3.0), (2, 4.0)], ("id", "v1")) | ||
df2 = self.spark.createDataFrame([(1, "x"), (2, "y")], ("id", "v2")) | ||
|
||
def summarize(l, r): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Merged to master, thank you! |
What changes were proposed in this pull request?
Support v2 (perf, memory) profiling in group/cogroup applyInPandas/applyInArrow, which rely on physical plan nodes FlatMapGroupsInBatchExec and FlatMapCoGroupsInBatchExec.
Why are the changes needed?
Complete v2 profiling support.
Does this PR introduce any user-facing change?
Yes. V2 profiling in group/cogroup applyInPandas/applyInArrow is supported.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.