You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ShuffleWriter.default_leaf(velox::memory::MemoryPool) allocated too much memory in VeloxHashShuffleWriter, causing an off-heap OOM.
24/11/26 21:31:42 ERROR Executor task launch worker for task 1559 ManagedReservationListener: Error reserving memory from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 0.0 B. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=13690208256
spark.gluten.memory.task.offHeap.size.in.bytes=6845104128
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=3422552064
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.1559: Current used bytes: 8.4 GiB, peak bytes: N/A
\- Gluten.Tree.0: Current used bytes: 8.4 GiB, peak bytes: 11.9 GiB
\- root.0: Current used bytes: 8.4 GiB, peak bytes: 11.9 GiB
+- ShuffleWriter.0: Current used bytes: 8.3 GiB, peak bytes: 8.8 GiB
| \- single: Current used bytes: 8.3 GiB, peak bytes: 8.8 GiB
| +- root: Current used bytes: 8.2 GiB, peak bytes: 8.2 GiB
| | \- default_leaf: Current used bytes: 8.2 GiB, peak bytes: 8.2 GiB
| \- gluten::MemoryAllocator: Current used bytes: 62.9 MiB, peak bytes: 1436.4 MiB
+- VeloxBatchAppender.0: Current used bytes: 104.0 MiB, peak bytes: 224.0 MiB
| \- single: Current used bytes: 104.0 MiB, peak bytes: 224.0 MiB
| +- root: Current used bytes: 100.2 MiB, peak bytes: 224.0 MiB
| | \- default_leaf: Current used bytes: 100.2 MiB, peak bytes: 216.8 MiB
| \- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- NativePlanEvaluator-1.0: Current used bytes: 25.0 MiB, peak bytes: 176.0 MiB
| \- single: Current used bytes: 25.0 MiB, peak bytes: 176.0 MiB
| +- root: Current used bytes: 22.6 MiB, peak bytes: 169.0 MiB
| | +- task.Gluten_Stage_2_TID_1559_VTID_0: Current used bytes: 22.6 MiB, peak bytes: 169.0 MiB
| | | +- node.0: Current used bytes: 22.1 MiB, peak bytes: 168.0 MiB
| | | | +- op.0.0.0.TableScan: Current used bytes: 22.1 MiB, peak bytes: 162.8 MiB
| | | | \- op.0.0.0.TableScan.test-hive: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | \- node.1: Current used bytes: 528.2 KiB, peak bytes: 1024.0 KiB
| | | \- op.1.0.0.FilterProject: Current used bytes: 528.2 KiB, peak bytes: 849.5 KiB
| | \- default_leaf: Current used bytes: 0.0 B, peak bytes: 0.0 B
| \- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- ArrowContextInstance.0: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- VeloxBatchAppender.0.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 67.2 MiB
+- IndicatorVectorBase#init.0.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 2.4 MiB
+- NativePlanEvaluator-1.0.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 52.8 MiB
+- ShuffleWriter.0.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 2.6 GiB
\- IndicatorVectorBase#init.0: Current used bytes: 0.0 B, peak bytes: 8.0 MiB
\- single: Current used bytes: 0.0 B, peak bytes: 8.0 MiB
+- root: Current used bytes: 0.0 B, peak bytes: 0.0 B
| \- default_leaf: Current used bytes: 0.0 B, peak bytes: 0.0 B
\- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:66)
at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
at org.apache.gluten.vectorized.ShuffleWriterJniWrapper.write(Native Method)
at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:177)
at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:231)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:134)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:479)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1448)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:482)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Where is VeloxMemoryPool used in VeloxHashShuffleWriter?
When splitComplexType() is called, the vector will first be serialized by PrestoVectorSerde, and then flushed to cache by the function evictPartitionBuffers(). The memory held by arenas_ will be freed only after flushing.
Why is so much memory used?
When doSplit is called, we estimate how many rows can fit within the current task's available memory, and then adapt the last partition buffers. We estimate without considering complex type columns, only simple columns. Thus, the memory of the complex type is missed. As we iterate batch by batch, we check if the current estimated rows are much larger than the already existing partition buffers. If so, we cache these buffers (evict partition buffer to payloadCache), and the cached payload will spill in the future, and then the memory is freed. f our complex type vector is large, the eviction is typically not triggered until the process has already run out of memory (OOM).
Possible Solutions
The default partition buffer size is 4096. In our case, the schema is {int, string, map<string, string>, map<string, string>}. Almost after iterating 200+ batches, the process will run out of memory. We can change this option to 200, and the job can succeed, but it's not a general solution.
When estimating how many rows can fit within the current task's available memory, also consider complex type columns. We can use arenas_ to do this.
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered:
Backend
VL (Velox)
Bug description
The ShuffleWriter.default_leaf(velox::memory::MemoryPool) allocated too much memory in
VeloxHashShuffleWriter
, causing an off-heap OOM.Where is VeloxMemoryPool used in VeloxHashShuffleWriter?
When
splitComplexType()
is called, the vector will first be serialized byPrestoVectorSerde
, and then flushed to cache by the functionevictPartitionBuffers()
. The memory held byarenas_
will be freed only after flushing.Why is so much memory used?
When
doSplit
is called, we estimate how many rows can fit within the current task's available memory, and then adapt the last partition buffers. We estimate without considering complex type columns, only simple columns. Thus, the memory of the complex type is missed. As we iterate batch by batch, we check if the current estimated rows are much larger than the already existing partition buffers. If so, we cache these buffers (evict partition buffer to payloadCache), and the cached payload will spill in the future, and then the memory is freed. f our complex type vector is large, the eviction is typically not triggered until the process has already run out of memory (OOM).Possible Solutions
{int, string, map<string, string>, map<string, string>}
. Almost after iterating 200+ batches, the process will run out of memory. We can change this option to 200, and the job can succeed, but it's not a general solution.arenas_
to do this.Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: