Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel-opencl-icd - 6-7% perf regression in OpenCL-Benchmark - WSL Ubuntu 22.04 vs 24.04 and debian bookworm (i9-12900H) #730

Open
fanoush opened this issue May 6, 2024 · 10 comments

Comments

@fanoush
Copy link

fanoush commented May 6, 2024

I use OpenCL-Benchmark-Linux to verify opencl is working in WSL. I just installed new Ubuntu 24.04 and it looks like it is slower than 22.04. Then I also tried in Debian 12 bookworm and it is slower too so only ubuntu 22.04 is faster. These are three WSL instances on same Windows 11 computer running same version of the benchmark binary from https://github.com/ProjectPhysX/OpenCL-Benchmark

Ubuntu 22.04

22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 1.0.0 (Linux)                                              |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 26082 MB, 1024 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         2.014 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.693 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.146  TIOPs/s (1/16) |
| INT32 compute                                         0.697  TIOPs/s (1/3 ) |
| INT16 compute                                         7.207  TIOPs/s ( 4x ) |
| INT8  compute                                         1.415  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         65.86 GB/s |
| Memory Bandwidth ( coalesced      write)                         60.58 GB/s |
| Memory Bandwidth (misaligned read      )                         65.51 GB/s |
| Memory Bandwidth (misaligned      write)                         32.39 GB/s |
| PCIe   Bandwidth (send                 )                         21.94 GB/s |
| PCIe   Bandwidth (   receive           )                         21.78 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   11.96 GB/s |
|-----------------------------------------------------------------------------|

Ubuntu 24.04 (and debian bookworm)

24.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 23.43.027642 (Linux)                                       |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 30197 MB, 3840 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         1.884 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.473 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.156  TIOPs/s (1/16) |
| INT32 compute                                         0.681  TIOPs/s (1/3 ) |
| INT16 compute                                         7.310  TIOPs/s ( 4x ) |
| INT8  compute                                         1.379  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         65.88 GB/s |
| Memory Bandwidth ( coalesced      write)                         60.36 GB/s |
| Memory Bandwidth (misaligned read      )                         65.55 GB/s |
| Memory Bandwidth (misaligned      write)                         32.58 GB/s |
| PCIe   Bandwidth (send                 )                         21.32 GB/s |
| PCIe   Bandwidth (   receive           )                         21.50 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   11.86 GB/s |
|-----------------------------------------------------------------------------|

bookworm:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 22.43.24595 (Linux)                                        |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 26082 MB, 3840 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         1.879 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.478 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.144  TIOPs/s (1/16) |
| INT32 compute                                         0.681  TIOPs/s (1/3 ) |
| INT16 compute                                         7.241  TIOPs/s ( 4x ) |
| INT8  compute                                         1.375  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         65.78 GB/s |
| Memory Bandwidth ( coalesced      write)                         60.78 GB/s |
| Memory Bandwidth (misaligned read      )                         66.12 GB/s |
| Memory Bandwidth (misaligned      write)                         32.21 GB/s |
| PCIe   Bandwidth (send                 )                         22.05 GB/s |
| PCIe   Bandwidth (   receive           )                         22.16 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   12.09 GB/s |
|-----------------------------------------------------------------------------|

FP32 and FP16 are faster in 22.04 release. the version is printed as Device Driver | 1.0.0 (Linux) while the 22.04 installed package is in fact

22.04:~$ dpkg -s intel-opencl-icd
Package: intel-opencl-icd
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 10591
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Source: intel-compute-runtime
Version: 22.14.22890-1

Ubuntu 24.04 intel-opencl-icd package is Version: 23.43.27642.40-1ubuntu3 and bookworm is Version: 22.43.24595.41-1

If you need any other info (like output of clinfo) let me know. These numbers are pretty consistent across multiple runs.
Is this expected or is the benchmark meaningless? Should I run some other test to verify or get more details?

@eero-t
Copy link

eero-t commented May 10, 2024

Normal Ubuntu 22.04 and 24.04 have also different kernel version, but I do not know whether it's same with WSL Ubuntu installations. If kernel versions do differ, it would be interesting to know whether kernel or user-space has more impact on performance, so this performance regression can be filed against correct project.

Could you try 22.04 kernel on 24.04 or vice verse?

If not, what about testing 22.04 compute driver version on 24.04, or vice verse?

@fanoush
Copy link
Author

fanoush commented May 10, 2024

There is same one kernel provided by microsoft

$ cat /proc/version
Linux version 5.15.146.1-microsoft-standard-WSL2 (root@65c757a075e2) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Jan 11 04:09:03 UTC 2024

All those three installations are running on same computer so all are using this same microsoft kernel, same WSL and WSLg version and same windows intel driver. I also tried to only start one of them for running test just to be sure.

BTW, the kernel source is at https://github.com/microsoft/WSL2-Linux-Kernel and one can build custom one from source but this is the default one provided by microsoft as part of WSL. I think you cannot run two kernels at once for different WSL instances. Also you cannot run real Ubuntu kernel since this one has special WSL drivers.

If not, what about testing 22.04 compute driver version on 24.04, or vice verse?

How would I do that?

$ dpkg -L intel-opencl-icd
/.
/etc
/etc/OpenCL
/etc/OpenCL/vendors
/etc/OpenCL/vendors/intel.icd
/usr
/usr/bin
/usr/bin/ocloc
/usr/include
/usr/include/ocloc_api.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/intel-opencl
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
/usr/lib/x86_64-linux-gnu/libocloc.so
/usr/share
/usr/share/doc
/usr/share/doc/intel-opencl-icd
/usr/share/doc/intel-opencl-icd/changelog.Debian.gz
/usr/share/doc/intel-opencl-icd/copyright

you mean copying the
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
/usr/lib/x86_64-linux-gnu/libocloc.so
files from older 22.14.22890-1 ubuntu 22.04 package into 24.04 and/or debian? I can try that.

@PorcelainMouse
Copy link

Anybody else seeing this?

|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Intel(R) Arc(TM) A750 Graphics                             |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 24.09.28717.17 (Linux)                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 448 at 2400 MHz (3584 cores, 17.203 TFLOPs/s)              |
| Memory, Cache  | 8127 MB, 16384 KB global / 64 KB local                     |
| Buffer Limits  | 3860 MB global, 3953458 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
                                 <--- hangs here
$ uname --kernel-release
6.8.8-200.fc39.x86_64

Different issue? Looks like I have different driver.

@fanoush
Copy link
Author

fanoush commented May 12, 2024

If not, what about testing 22.04 compute driver version on 24.04, or vice verse?

OK, the result is interesting. Older version copied to newer distro becomes slower too in exactly the same way.

bookworm:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 1.0.0 (Linux)                                              |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 26082 MB, 1024 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         1.881 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.478 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.148  TIOPs/s (1/16) |
| INT32 compute                                         0.681  TIOPs/s (1/3 ) |
| INT16 compute                                         7.286  TIOPs/s ( 4x ) |
| INT8  compute                                         1.375  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         66.43 GB/s |
| Memory Bandwidth ( coalesced      write)                         61.57 GB/s |
| Memory Bandwidth (misaligned read      )                         66.10 GB/s |
| Memory Bandwidth (misaligned      write)                         32.45 GB/s |
| PCIe   Bandwidth (send                 )                         21.57 GB/s |
| PCIe   Bandwidth (   receive           )                         21.65 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   11.93 GB/s |
|-----------------------------------------------------------------------------|

Newer driver does not run in older ubuntu so cannot test the other way

22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 22.43.24595 (Linux)                                        |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 26082 MB, 3840 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Warning:                                                                    |
| Error: OpenCL C code compilation failed with error code -6. Make sure there |
|        are no errors in kernel.cpp.                                         |
'-----------------------------------------------------------------------------'

So can it be related to system libraries like libc or even C compiler? How is the OpenCL C code compiled for the gpu?

@eero-t
Copy link

eero-t commented May 13, 2024

All those three installations are running on same computer so all are using this same microsoft kernel, same WSL and WSLg version and same windows intel driver. I also tried to only start one of them for running test just to be sure.

24.04 should be using 6.8 kernel, not the 5.15 one in 22.04... Are you sure 22.04 and 24.04 Ubuntu WSL versions are really using same kernel version?

Note: In normal Ubuntu LTS installs, one can install newer, so called "HW Enabling" kernels, few months after they've first been tested in Ubuntu devel versions. Latest HWE kernel available for 22.04 is 6.5: https://packages.ubuntu.com/jammy/linux-generic-hwe-22.04

Older version copied to newer distro becomes slower too in exactly the same way.

Did you copy Just the older version of compute runtime (intel-opencl-icd), or also rest of the compute stack [1]?

(IGC, LLVM and kernel are most likely components in the stack to affect these numbers.)

How is the OpenCL C code compiled for the gpu?

Using IGC: https://github.com/intel/intel-graphics-compiler/

Which uses LLVM, opencl-clang and SPIRV-translator.

AFAIK IGC packages in distros (like Ubuntu) use distro-specific versions of those dependencies, linked dynamically, whereas IGC packages from Intel package repos, and releases here, include statically linked LLVM version.

[1] I assume you're using distro versions of everything. Ubuntu 22.04 => 24.04 upgrade implies following version changes:

  • intel-opencl-icd:
    • 22.14.22890 => 23.43.27642.40
  • libigc / libigdfcl1:
    • 1.0.10840 => 1.0.15468.25
  • libigdgmm12:
    • 22.3.9 => 22.3.17
  • LLVM + SPIRV + OpenCL-Clang libs:
    • v12 => v14
  • Kernel:
    • 5.15 => 6.8

See: https://packages.ubuntu.com/noble/intel-opencl-icd (and what it links).

PS. You could add to title something like "6-7% perf regression in OpenCL-Benchmark".

@fanoush fanoush changed the title intel-opencl-icd - performance regression - WSL Ubuntu 22.04 vs 24.04 and debian bookworm (i9-12900H) intel-opencl-icd - 6-7% perf regression in OpenCL-Benchmark - WSL Ubuntu 22.04 vs 24.04 and debian bookworm (i9-12900H) May 13, 2024
@fanoush
Copy link
Author

fanoush commented May 13, 2024

24.04 should be using 6.8 kernel, not the 5.15 one in 22.04... Are you sure 22.04 and 24.04 Ubuntu WSL versions are really using same kernel version?

Not sure we are talking about same thing, WSL = Windows Subsystem for Linux. As mentioned previously there is only one kernel

Microsoft Windows [Version 10.0.22631.3447]
(c) Microsoft Corporation. All rights reserved.

C:\>wsl -v
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3447
C:\>wsl --update
Checking for updates.
The most recent version of Windows Subsystem for Linux is already installed.

and the Windows driver is
image

Did you copy Just the older version of compute runtime (intel-opencl-icd),

Yes, just libraries listed in dpkg -L intel-opencl-icd , did not know about the rest.

Also is that benchmark good reference or is there better way to check if this regression is real?

And BTW I just followed the readme here
https://github.com/intel/compute-runtime?tab=readme-ov-file#via-system-package-manager
so I installed packages from ubuntu/debian repo. I did not follow
https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html (which is maybe outdated?) to add intel repos.

@eero-t
Copy link

eero-t commented May 13, 2024

Also is that benchmark good reference or is there better way to check if this regression is real?

Unfortunately I have no idea.

(While I work for Intel and know a bit about the Linux drivers, I'm not a driver developer, or otherwise related to this project. I'm just another user of this driver, mainly for its Level-Zero Sysman API, not its OpenCL API.)

And BTW I just followed the readme here https://github.com/intel/compute-runtime?tab=readme-ov-file#via-system-package-manager so I installed packages from ubuntu/debian repo. I did not follow https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html

I think WSL docs recommend using the driver versions from Intel repos because distro driver versions are compiled only with support for upstream Linux kernel, which is missing some things that are in the Intel out-of-tree kernel driver, and I assume in WSL / Windows kernel drivers: https://dgpu-docs.intel.com/driver/kernel-driver-types.html#differences-between-the-out-of-tree-driver-and-the-upstream-kernel

(which is maybe outdated?) to add intel repos.

While those repo names should AFAIK still work, the recommended repo names have changed a bit since then. Latest Intel driver repo info is here: https://dgpu-docs.intel.com/driver/client/overview.html

(That page is for client GPUs, like iGPUs.)

@JablonskiMateusz
Copy link
Contributor

Hi @fanoush

I see you have different drivers on the systems:

22.04
| Device Driver | 1.0.0 (Linux) |

24.04
| Device Driver | 23.43.027642 (Linux) |

Could you please retry using packages from our latest github release?

@fanoush
Copy link
Author

fanoush commented May 16, 2024

Could you please retry using packages from our latest github release?

Both are what comes from ubuntu repos for those versions, the one reporting 1.0.0 is actually ubuntu package with version 22.14.22890-1

I did this in bookworkm first, unistalled packages from repo sudo apt-get purge intel-opencl-icd ; sudo apt-get autoremove

Removing intel-opencl-icd (22.43.24595.41-1) ...
Removing libigdfcl1:amd64 (1.0.12504.6-1+deb12u1) ...
Removing libopencl-clang14:amd64 (14.0.0-4) ...
Removing libclang-cpp14 (1:14.0.6-12) ...
Removing libigc1:amd64 (1.0.12504.6-1+deb12u1) ...
Removing libigdgmm12:amd64 (22.3.3+ds1-1) ...
Removing libllvmspirvlib14:amd64 (14.0.0-5) ...
Removing libllvm14:amd64 (1:14.0.6-12) ...
Removing libz3-4:amd64 (4.8.12-3.1) ...

and installed https://github.com/intel/compute-runtime/releases/tag/24.13.29138.7 via wget/dpkg
results are very similar i.e. slower

bookworm:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 24.13.29138.7 (Linux)                                      |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 30197 MB, 3840 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         1.887 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.480 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.158  TIOPs/s (1/16) |
| INT32 compute                                         0.682  TIOPs/s (1/3 ) |
| INT16 compute                                         7.353  TIOPs/s ( 4x ) |
| INT8  compute                                         1.381  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         64.83 GB/s |
| Memory Bandwidth ( coalesced      write)                         57.96 GB/s |
| Memory Bandwidth (misaligned read      )                         64.61 GB/s |
| Memory Bandwidth (misaligned      write)                         31.98 GB/s |
| PCIe   Bandwidth (send                 )                         20.88 GB/s |
| PCIe   Bandwidth (   receive           )                         21.15 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   10.58 GB/s |
|-----------------------------------------------------------------------------|

Did the same in ubuntu 22.04

Removing intel-opencl-icd (22.14.22890-1) ...
Removing libigdfcl1:amd64 (1.0.10840-1) ...
Removing libopencl-clang12:amd64 (12.0.0-3) ...
Removing libclang-cpp12 (1:12.0.1-19ubuntu3) ...
Removing libigc1:amd64 (1.0.10840-1) ...
Removing libigdgmm12:amd64 (22.1.2+ds1-1) ...
Removing libllvmspirvlib12:amd64 (12.0.0-3) ...
Removing libllvm12:amd64 (1:12.0.1-19ubuntu3) ...

and the results in 22.04 is now same = slower

22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 24.13.29138.7 (Linux)                                      |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 30197 MB, 3840 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         1.884 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.480 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.160  TIOPs/s (1/16) |
| INT32 compute                                         0.682  TIOPs/s (1/3 ) |
| INT16 compute                                         7.205  TIOPs/s ( 4x ) |
| INT8  compute                                         1.379  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         66.30 GB/s |
| Memory Bandwidth ( coalesced      write)                         61.70 GB/s |
| Memory Bandwidth (misaligned read      )                         65.16 GB/s |
| Memory Bandwidth (misaligned      write)                         32.56 GB/s |
| PCIe   Bandwidth (send                 )                         21.44 GB/s |
| PCIe   Bandwidth (   receive           )                         21.61 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   11.83 GB/s |
|-----------------------------------------------------------------------------|

So only the old 22.14.22890-1 from ubuntu 22.04 gives better numbers.

sudo dpkg --purge intel-igc-core intel-igc-opencl intel-opencl-icd libigdgmm12 intel-level-zero-gpu
....
sudo apt-get install intel-opencl-icd
...
Unpacking intel-opencl-icd (22.14.22890-1) ...
...
Setting up libigdgmm12:amd64 (22.1.2+ds1-1) ...
Setting up libllvm12:amd64 (1:12.0.1-19ubuntu3) ...
Setting up libllvmspirvlib12:amd64 (12.0.0-3) ...
Setting up libclang-cpp12 (1:12.0.1-19ubuntu3) ...
Setting up libopencl-clang12:amd64 (12.0.0-3) ...
Setting up libigc1:amd64 (1.0.10840-1) ...
Setting up libigdfcl1:amd64 (1.0.10840-1) ...
Setting up intel-opencl-icd (22.14.22890-1) ...
...
22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Graphics [0x46a6]                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Graphics [0x46a6]                                 |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 1.0.0 (Linux)                                              |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s)                 |
| Memory, Cache  | 26082 MB, 1024 KB global / 64 KB local                     |
| Buffer Limits  | 1024 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                          not supported        |
| FP32  compute                                         2.020 TFLOPs/s ( 1x ) |
| FP16  compute                                         3.699 TFLOPs/s ( 2x ) |
| INT64 compute                                         0.147  TIOPs/s (1/16) |
| INT32 compute                                         0.693  TIOPs/s (1/3 ) |
| INT16 compute                                         7.245  TIOPs/s ( 4x ) |
| INT8  compute                                         1.415  TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read      )                         66.18 GB/s |
| Memory Bandwidth ( coalesced      write)                         60.98 GB/s |
| Memory Bandwidth (misaligned read      )                         65.39 GB/s |
| Memory Bandwidth (misaligned      write)                         32.60 GB/s |
| PCIe   Bandwidth (send                 )                         21.86 GB/s |
| PCIe   Bandwidth (   receive           )                         21.93 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   12.00 GB/s |
|-----------------------------------------------------------------------------|

@fanoush
Copy link
Author

fanoush commented May 16, 2024

And btw the benchmark code running the FP32 and FP16 kernels is here
https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/main.cpp#L53
and the source of opencl kernels start here https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/kernel.cpp#L18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants