[SPARK-50243][SQL][Connect] Cached classloader for ArtifactManager #49007
+86
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR implements a caching mechanism for the classloader of
ArtifactManager
, to avoid re-generating a new one every time when a SQL query runs. This change also fixes a longstanding bug where the codegen cache was broken for Spark Connect, due to new classloaders causing cache misses:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
Lines 1487 to 1490 in 05728e4
The approach we use is to cache the generated classloader until a new artifact is added.
Why are the changes needed?
To improve performance and fix codegen caching.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added two new test cases.
Also, some existing tests will fail when the config
ARTIFACTS_SESSION_ISOLATION_ALWAYS_APPLY_CLASSLOADER
is enabled:SPARK-37753: Inhibit broadcast in left outer join when there are many empty partitions on outer/left side
SPARK-27871: Dataset encoder should benefit from codegen cache
These tests will not fail after this PR.
Was this patch authored or co-authored using generative AI tooling?
No.