Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-50258][SQL] Fix output column order changed issue after AQE op…
…timization ### What changes were proposed in this pull request? The root cause of this issue is the planner turns `Limit` + `Sort` into `TakeOrderedAndProjectExec` which adds an additional `Project` that does not exist in the logical plan. We shouldn't use this additional `Project` to optimize out other `Project`s, otherwise when AQE turns physical plan back to logical plan, we lose the `Project` and may mess up the output column order. This PR makes it does not remove redundant projects if AEQ is enabled and projectList is the same as child output in `TakeOrderedAndProjectExec`. ### Why are the changes needed? Fix potential data issue and avoid Spark Driver crash: ``` # more hs_err_pid193136.log # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f9d14841bc0, pid=193136, tid=223205 # # JRE version: OpenJDK Runtime Environment Zulu17.36+18-SA (17.0.4.1+1) (build 17.0.4.1+1-LTS) # Java VM: OpenJDK 64-Bit Server VM Zulu17.36+18-SA (17.0.4.1+1-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # v ~StubRoutines::jint_disjoint_arraycopy_avx3 # # Core dump will be written. Default location: /apache/spark-release/3.5.0-20241105/spark/core.193136 ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48789 from wangyum/SPARK-50258. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information