-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clpeak's enqueueReadBuffer very slow #775
Comments
Hello, thanks for feedback. We are isolating unexpected performance degradation with Debian. |
Thanks!
`$ clpeak Platform: Intel(R) OpenCL Graphics
It never gets beyond that enqueReadBuffer non-blocking dmesg (clinfo was run twice):
Ubuntu 24.10 is the same. Until last week I was running A770 with i9-14900k I never saw this issue, and then I swapped CPU to Core Ultra 9 285k, and then the problem surfaced. But I do not know, it could be a coincidence. |
I did little investigation and the issue has to do with the host buffer alignment for D2H transfers only (H2D direction does not seem to care). I suppose it is not really related to Arc graphics or NEO driver or the operating system but to the host CPU and how the application does allocations. Raptor-lake already suffered in clpeak doing worse D2H than H2D, but it got much worse with Arrowlake:
So as mentioned earlier upstream clpeak with 16-byte host aligned buffer does 1.6 GBps D2H on ArrowLake:
L0 tests show healthy D2H transfers matching H2D (L0 tests seem to use page aligned buffers) using blitter (I believe that is what OpenCL is using):
and (somewhat worse) using compute engine (not relevant):
Forcing 64B host buffer alignment in clpeak fixes ArrowLake performance, which matches L0 tests.
|
Setup: Intel Arc A770 (56a0)
MSI PRO Z890-P WIFI, Intel Core Ultra 9 285K
Debian Trixie (6.11.5 kernel):
clpeak's enqueueReadBuffer performance is very poor and it is actually hanging on the non-blocking version.
dmesg shows some fence issues:
[ 352.610742] Fence expiration time out i915-0000:04:00.0:clpeak[5952]:a8! [ 352.729366] Fence expiration time out i915-0000:04:00.0:clpeak[5952]:a6!
it used to report over 8 GBPS (Z690/i9 14900k).
The text was updated successfully, but these errors were encountered: