[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050

xinrong-meng · 2024-02-06T22:07:51Z

What changes were proposed in this pull request?

Support v2 (perf, memory) profiling in group/cogroup applyInPandas/applyInArrow, which rely on physical plan nodes FlatMapGroupsInBatchExec and FlatMapCoGroupsInBatchExec.

Why are the changes needed?

Complete v2 profiling support.

Does this PR introduce any user-facing change?

Yes. V2 profiling in group/cogroup applyInPandas/applyInArrow is supported.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

python/pyspark/sql/tests/test_udf_profiler.py

ueshin

LGTM, pending tests.

ueshin

./python/pyspark/tests/test_memory_profiler.py:548:23: E741 ambiguous variable name 'l'
        def summarize(l, r):
                      ^
./python/pyspark/sql/tests/test_udf_profiler.py:501:23: E741 ambiguous variable name 'l'
        def summarize(l, r):
                      ^

ueshin · 2024-02-08T01:39:29Z

python/pyspark/sql/tests/test_udf_profiler.py

+        df1 = self.spark.createDataFrame([(1, 1.0), (2, 2.0), (1, 3.0), (2, 4.0)], ("id", "v1"))
+        df2 = self.spark.createDataFrame([(1, "x"), (2, "y")], ("id", "v2"))
+
+        def summarize(l, r):


Shall we rename to left and right according to the lint error?

I was wondering the reason and saw "In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead." That's good to learn :p

ueshin · 2024-02-08T01:39:45Z

python/pyspark/tests/test_memory_profiler.py

+        df1 = self.spark.createDataFrame([(1, 1.0), (2, 2.0), (1, 3.0), (2, 4.0)], ("id", "v1"))
+        df2 = self.spark.createDataFrame([(1, "x"), (2, "y")], ("id", "v2"))
+
+        def summarize(l, r):


xinrong-meng · 2024-02-08T18:56:44Z

Merged to master, thank you!

github-actions bot added SQL CORE PYTHON labels Feb 6, 2024

ueshin reviewed Feb 7, 2024

View reviewed changes

python/pyspark/sql/tests/test_udf_profiler.py Show resolved Hide resolved

python/pyspark/sql/tests/test_udf_profiler.py Show resolved Hide resolved

xinrong-meng changed the title ~~[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas~~ [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow Feb 7, 2024

xinrong-meng added 4 commits February 7, 2024 13:56

enable

e3103b6

test

077b7b1

test apply in arrow

6cfa9cf

reformat

877056b

xinrong-meng force-pushed the other_p2 branch from 306a427 to 877056b Compare February 7, 2024 21:58

xinrong-meng added 3 commits February 7, 2024 14:00

reformat

4e5c964

reformat

c59c6d8

reformat

4299953

xinrong-meng marked this pull request as ready for review February 7, 2024 22:06

ueshin approved these changes Feb 7, 2024

View reviewed changes

ueshin reviewed Feb 8, 2024

View reviewed changes

lint

78fd8a1

xinrong-meng closed this in 1a66c8c Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050

[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050

xinrong-meng commented Feb 6, 2024 •

edited

Loading

ueshin left a comment

ueshin left a comment

ueshin Feb 8, 2024 •

edited

Loading

xinrong-meng Feb 8, 2024

ueshin Feb 8, 2024

xinrong-meng commented Feb 8, 2024

[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050

[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow #45050

Conversation

xinrong-meng commented Feb 6, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

ueshin left a comment

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

ueshin Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

xinrong-meng Feb 8, 2024

Choose a reason for hiding this comment

ueshin Feb 8, 2024

Choose a reason for hiding this comment

xinrong-meng commented Feb 8, 2024

xinrong-meng commented Feb 6, 2024 •

edited

Loading

ueshin Feb 8, 2024 •

edited

Loading