diff --git a/doc/developer-guide.md b/doc/developer-guide.md index 7e6a75d67..9adc7807c 100644 --- a/doc/developer-guide.md +++ b/doc/developer-guide.md @@ -53,7 +53,8 @@ Current directory layout: The two long running branches are: -* `stable`: This branch is merged to on a successful nightly run and should not be merged into directly. +* `stable`: This branch is merged to on a successful nightly run and should not + be merged into directly. * `main`: This is the main branch for on-going development. > No force pushes are allowed on these two branches. @@ -92,13 +93,13 @@ configuration file: To run `cmakelint` using the `.cmakelintrc` configuration file in the root of the oneAPI Construction Kit repository: -```console -$ cmakelint --config=.cmakelintrc [] ... +```sh +cmakelint --config=.cmakelintrc [] ... ``` ### Python Style -All the Python code in the oneAPI Construction Kit *must* be formatted using +All the Python code in the oneAPI Construction Kit _must_ be formatted using [`yapf`][yapf] set to the [pep8][pep8] style (default). As with `clang-format` it's not perfect in all situations and occasionally does something baffling, but the consistency mostly keeps the holy warriors at bay. @@ -116,7 +117,6 @@ issues and suggested solutions. [yapf]: https://pypi.org/project/yapf/ [pep8]: https://www.python.org/dev/peps/pep-0008/ -[isort]: https://pypi.org/project/isort/ [pylint]: https://pypi.org/project/pylint/ [flake8]: https://pypi.org/project/flake8/ [future]: https://pypi.org/project/future/ @@ -126,9 +126,9 @@ issues and suggested solutions. The Khronos ICD allows multiple OpenCL implementations to coexist in the same system, these implementations will usually be exposed to the OpenCL user as individual `cl_platform_id`'s. To inform the system's OpenCL ICD where to find -the oneAPI Construction Kit OpenCL driver it needs to be registered. Note that we also -support fetching and building an ICD within the toolkit through cmake options as described -[here](/source/cl/icd-loader.rst). +the oneAPI Construction Kit OpenCL driver it needs to be registered. Note that +we also support fetching and building an ICD within the toolkit through cmake +options as described [here](/source/cl/icd-loader.rst). ### Linux Registration @@ -148,8 +148,8 @@ For Windows the ICD inspects the registry so to register the oneAPI Construction Kit OpenCL driver a new registry entry must be added. Add a `REG_DWORD` value to the appropriate registry path. -* 32-bit - `HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\OpenCL\Vendors` -* 64-bit - `HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors` +* 32-bit - `HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\OpenCL\Vendors` +* 64-bit - `HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors` The `REG_DWORD` value's name should be the path to the oneAPI Construction Kit OpenCL driver and its data should be `0`. @@ -203,113 +203,130 @@ code itself see our [CMake Development](cmake) documentation. The flags used when invoking CMake on the command line which are used in the examples shown later in this document. -- `-B`: An undocumented option which creates a build directory `` if +* `-B`: An undocumented option which creates a build directory `` if it does not already exist then configures the build system in that directory. It is important to specify the source directory position argument otherwise you will see unexpected behaviour. -- `-G`: Specifies the build system generator to use, when not +* `-G`: Specifies the build system generator to use, when not specified the platform specific default generator is used. -- `-D=`: Defines a CMake option stored in `CMakeCache.txt` to +* `-D=`: Defines a CMake option stored in `CMakeCache.txt` to control how CMake configures the build directory. ### CMake Options The builtin CMake options used when invoking CMake on the command line. -- `CMAKE_BUILD_TYPE`: CMake provides a default set of build types: - - `Debug`: Enable debug symbols and disable optimizations. - - `Release`: Enable optimizations and disable assertions. - - `RelWithDebInfo`: Enable debug symbols, optimizations, and disable +* `CMAKE_BUILD_TYPE`: CMake provides a default set of build types: + * `Debug`: Enable debug symbols and disable optimizations. + * `Release`: Enable optimizations and disable assertions. + * `RelWithDebInfo`: Enable debug symbols, optimizations, and disable assertions. - - `MinSizeRel`: Enable size optimizations and disable assertions. -- `CMAKE_INSTALL_PREFIX`: Path to write files produced by the `install` target. -- `CMAKE_TOOLCHAIN_FILE`: Path to a CMake script, used to cross-compile a + * `MinSizeRel`: Enable size optimizations and disable assertions. +* `CMAKE_INSTALL_PREFIX`: Path to write files produced by the `install` target. +* `CMAKE_TOOLCHAIN_FILE`: Path to a CMake script, used to cross-compile a project, which defines variables that inform CMake where the compiler, assembler, linker, etc. for the target platform reside. #### oneAPI Construction Kit CMake Options -- `CMAKE_BUILD_TYPE`: In addition to the defaults provided by CMake the oneAPI +* `CMAKE_BUILD_TYPE`: In addition to the defaults provided by CMake the oneAPI Construction Kit extends the builtin build types: - - `ReleaseAssert`: Enable assertions is a Release build. -- `CA_USE_SANITIZER`: Enable support for dynamic analysis sanitizers: - - `Address`: Enable [AddressSanitizer][asan] dynamic analysis for + + * `ReleaseAssert`: Enable assertions is a Release build. + +* `CA_USE_SANITIZER`: Enable support for dynamic analysis sanitizers: + + * `Address`: Enable [AddressSanitizer][asan] dynamic analysis for memory errors. - - `Thread`: Enable [ThreadSanitizer][tsan] dynamic analysis for data + * `Thread`: Enable [ThreadSanitizer][tsan] dynamic analysis for data races. - - `Undefined`: Enable [UndefinedBehaviourSanitizer][ubsan] dynamic + * `Undefined`: Enable [UndefinedBehaviourSanitizer][ubsan] dynamic analysis for undefined behaviour. This is currently broken with gcc; use clang for working ubsan support (CA-4237). - - `Address,Undefined`: Enable combined [AddressSanitizer][asan] and + * `Address,Undefined`: Enable combined [AddressSanitizer][asan] and [UndefinedBehaviourSanitizer][ubsan] dynamic analysis. - - `Fuzzer`: Enable [libFuzzer][libfuzzer] instrumentation. -- `CA_LLVM_INSTALL_DIR`: Tells the oneAPI Construction Kit to use the LLVM + * `Fuzzer`: Enable [libFuzzer][libfuzzer] instrumentation. + +* `CA_LLVM_INSTALL_DIR`: Tells the oneAPI Construction Kit to use the LLVM and Clang installation that can be found at this prefix. The LLVM and Clang installations must be development installations i.e. they must contain the relevant llvm headers and support tools, and their version must match a supported LLVM version. -- `CA_ENABLE_API`: Semi-colon separated list of APIs to enable. Valid values + +* `CA_ENABLE_API`: Semi-colon separated list of APIs to enable. Valid values are `cl` for OpenCL, and `vk` for Vulkan. Enabling an API when an optional component is not present dependent on license agreement will result in a CMake error. The default is `cl;vk`. -- `CA_BUILD_32_BITS`: Enable compiling in 32-bit mode on Linux, this requires + +* `CA_BUILD_32_BITS`: Enable compiling in 32-bit mode on Linux, this requires to have the proper 32-bit toolchain installed. When used in combination with an external LLVM, the external LLVM also needs to be built in 32-bit mode. -- `CA_EXTERNAL_BUILTINS_DIR` is used to specify the directory + +* `CA_EXTERNAL_BUILTINS_DIR` is used to specify the directory containing pre-generated builtins. This option is mandatory when cross compiling. It is usually set to the `modules/builtins` directory in the build directory of a host oneAPI Construction Kit build, but can be set to another directory as long as it contains generated builtins. -- `CA_EXTERNAL_BUILTINS`: This option is used to specify whether or not builtins + +* `CA_EXTERNAL_BUILTINS`: This option is used to specify whether or not builtins should be generated. If it is set to `OFF`, `CA_EXTERNAL_BUILTINS_DIR` must be provided to indicate which builtins to use instead. This option is set to `ON` for cross compile builds. -- `CA_BUILTINS_TOOLS_DIR`: This options makes it possible to specify which tools + +* `CA_BUILTINS_TOOLS_DIR`: This options makes it possible to specify which tools to use in order the build the builtins, executables for the correct versions of `clang` and `llvm-link` must be found in this directory. This can also be used for cross-compile builds in which case the tools must work on the host. -- `CA_RUNTIME_COMPILER_ENABLED`: This option determines whether the oneAPI + +* `CA_RUNTIME_COMPILER_ENABLED`: This option determines whether the oneAPI Construction Kit is built with or without a runtime compiler (LLVM). It defaults to `ON`. Without a runtime compiler, only pre-compiled binaries can be run, and the oneAPI Construction Kit implements an embedded profile. -- `CA_CLANG_TIDY_FLAGS`: This option specifies a semi-colon separated list of + +* `CA_CLANG_TIDY_FLAGS`: This option specifies a semi-colon separated list of additional flags which are passed to `clang-tidy` when invoking `tidy` targets. -- `CA_HOST_ENABLE_BUILTIN_KERNEL`: This option enables builtin kernel support + +* `CA_HOST_ENABLE_BUILTIN_KERNEL`: This option enables builtin kernel support within the host target. By default, it is set to `OFF`. If enabled this will report that host supports builtin kernels and will also enable two test kernels that are used by UnitMux and UnitCL to verify functionality. -- `CA_HOST_ENABLE_FP64`: This option determines whether host is built with or + +* `CA_HOST_ENABLE_FP64`: This option determines whether host is built with or without double support. By default, it is only enabled on non-Windows platforms. -- `CA_HOST_ENABLE_FP16`: This option determines whether host is built with or + +* `CA_HOST_ENABLE_FP16`: This option determines whether host is built with or without half support. It is disabled by default since we can't detect if this feature is natively supported by hardware, which is a requirement. -- `CA_HOST_ENABLE_PAPI_COUNTERS`: This option enables performance counter + +* `CA_HOST_ENABLE_PAPI_COUNTERS`: This option enables performance counter support in host via the Mux `query_pool` API and the PAPI performance counter API. Requires the PAPI library and headers to be installed on the system. Currently this only works on Linux. -- `CA_HOST_CROSS_COMPILERS`: This option specifies a semi-colon separated list + +* `CA_HOST_CROSS_COMPILERS`: This option specifies a semi-colon separated list of compilers registered to enable offline or cross-compilation for non-native host CPU's, e.g. for Linux kernel cross-compile `arm`, `aarch64`, `x86`, `x86_64` may be specified, alternatively set to `all` to enable all backends which were built during the LLVM install. -- `CMAKE_SKIP_RPATH`: On Linux the oneAPI Construction Kit specifies a relative + +* `CMAKE_SKIP_RPATH`: On Linux the oneAPI Construction Kit specifies a relative `RPATH` for all targets when they are installed using `CMAKE_INSTALL_RPATH`, this ensures that when the `install` target is invoked the user does not need to specify `LD_LIBRARY_PATH` to correctly execute a test binary in order to use the installed OpenCL or Vulkan library. Do disable this behaviour set `-DCMAKE_SKIP_RPATH=ON` when configuring CMake in build directory. -- `CA_HOST_TARGET__CPU`: This option is used by the `host` target to - optimize for performance on a given CPU. `arch` should be a capitalized +* `CA_HOST_TARGET__CPU`: This option is used by the `host` target to + optimize for performance on a given CPU.`arch` should be a capitalized version of the `host` target architecture e.g. `X86_64`, `RISCV64` or `AARCH64`. If set to "native" host will optimize for the CPU being used to compile it. Otherwise a CPU name can be provided, for example "skylake", but be warned that this string will be passed directly to the llvm backend so make sure it's a valid CPU name. Information about your host CPU can be found by - running `llc --version`, and a list of host CPUs supported by your installed - version of LLVM can be found by running `llc --march=[your-arch] --mcpu=help`. + running`llc --version`, and a list of host CPUs supported by your installed + version of LLVM can be found by running`llc --march=[your-arch] --mcpu=help`. Be aware that if `host` is compiled with this option set, running it on a different CPU from the one specified (or the one compiled with if "native" was @@ -320,17 +337,21 @@ The builtin CMake options used when invoking CMake on the command line. caveats above apply, and this may result in an illegal instruction crash if your CPU doesn't support the generated instructions. -- `CA_USE_SPLIT_DWARF`: When building with gcc, enable split dwarf debuginfo. +* `CA_USE_SPLIT_DWARF`: When building with gcc, enable split dwarf debuginfo. This significantly reduces binary size (especially when static linking) and speeds up the link step. Requires a non-ancient toolchain. -- `CA_CL_TEST_STATIC_LIB`: Forces all of our CL executable targets to link the + +* `CA_CL_TEST_STATIC_LIB`: Forces all of our CL executable targets to link the static CL library rather than the normal dynamic one, to force testing with the static library. -- `CA_MUX_TARGETS_TO_ENABLE`: A `;` separated list of `mux` targets that should + +* `CA_MUX_TARGETS_TO_ENABLE`: A `;` separated list of `mux` targets that should be enabled. By default this is set to the `host` target. -- `CA_EXTERNAL_MUX_TARGET_DIRS`: A `;` separated list of external `mux` targets that + +* `CA_EXTERNAL_MUX_TARGET_DIRS`: A `;` separated list of external `mux` targets that should be built. The base directory name must be that of the target. -- `CA_EXTERNAL_MUX_COMPILER_DIRS`: A `;` separated list of external + +* `CA_EXTERNAL_MUX_COMPILER_DIRS`: A `;` separated list of external `compiler` targets that should be built. The base directory name must be that of the target. @@ -1016,13 +1037,18 @@ gotchas: ### Testing oneapi-construction-kit application examples using official Intel oneAPI Base Toolkit -Download the official Intel OneAPI Base Toolkit following the instructions mentioned [here](../README.md#compiling-oneapi-samples-vector-add-using-official-intel-oneapi-base-toolkit). +Download the official Intel OneAPI Base Toolkit following the instructions +mentioned [here](../README.md#compiling-oneapi-samples-vector-add-using-official-intel-oneapi-base-toolkit). To compile the tests follow the steps below: ```sh mkdir build_tests -cmake -GNinja -Bbuild_tests -DCMAKE_CXX_COMPILER=/path/to/intel_oneapi/bin/clang++ /path/to/oneapi-construction-kit/examples/applications -DOpenCL_LIBRARY=/path/to/build/lib/libCL.so -DOpenCL_INCLUDE_DIR=/path/to/build-riscv/include +cmake -GNinja -Bbuild_tests \ + -DCMAKE_CXX_COMPILER=/path/to/intel_oneapi/bin/clang++ \ + /path/to/oneapi-construction-kit/examples/applications \ + -DOpenCL_LIBRARY=/path/to/build/lib/libCL.so \ + -DOpenCL_INCLUDE_DIR=/path/to/build-riscv/include ninja -C build_tests ``` @@ -1043,13 +1069,15 @@ export SYCL_CONFIG_FILE_NAME="" ``` The tests can be run using `ctest` command. + ```sh cd build_tests ctest ``` The generated output should be as follows: -```sh + +```plain text Test project /path/to/build_tests Start 1: simple_vector_add 1/7 Test #1: simple_vector_add .................. Passed 0.06 sec @@ -1115,7 +1143,7 @@ compiler pass. When given a [pass name](#pass-names) or comma-separated list of pass names, it prints the IR before or after every instance of those passes, on every unit of IR: -``` +```llvm > CA_LLVM_OPTIONS=-print-after=early-cse,mem2reg ... *** IR Dump After EarlyCSEPass on foo *** @@ -1160,7 +1188,7 @@ entry: The two options can be combined, e.g., to better inspect the result of a specific pass: -``` +```llvm > CA_LLVM_OPTIONS="-print-before=early-cse -print-after=early-cse" ... *** IR Dump Before EarlyCSEPass on bar *** @@ -1207,7 +1235,7 @@ to the LLVM source code) at `llvm/lib/Passes/PassRegistry.def`. Since LLVM ComputeMux use the same style of pass registration, both contain lines such as: -``` +```c++ MODULE_PASS("always-inline", AlwaysInlinerPass()) MODULE_PASS("add-sched-params", utils::AddSchedulingParametersPass()) @@ -1240,7 +1268,7 @@ This option works like [-print-after-all](#-print-after-all-and--print-before-all) but only when the pass makes a change to the IR. -``` +```llvm *** IR Dump After CoroEarlyPass on [module] omitted because no change *** *** IR Dump After LowerExpectIntrinsicPass on bar omitted because no change *** *** IR Dump After SimplifyCFGPass on bar omitted because no change *** @@ -1283,7 +1311,7 @@ invalid IR. This option gives an overview of each compiler pipeline, listing the passes and analyses running at each point. -``` +```plain text > CA_LLVM_OPTIONS=-debug-pass-manager Running pass: ForceFunctionAttrsPass on [module] @@ -1297,15 +1325,18 @@ Running pass: compiler::StripFastMathAttrs on bar (24 instructions) Values can be passed to this option to control its behaviour: * `-debug-pass-manager=quiet` - Skips printing of analyses - ``` + + ```plain text Running pass: ForceFunctionAttrsPass on [module] Running pass: compiler::SoftwareDivisionPass on bar (24 instructions) Running pass: compiler::StripFastMathAttrs on bar (24 instructions) ; ``` + * `-debug-pass-manager=verbose` - Prints additional information about pass managers and adaptors. - ``` + + ```plain text Running pass: ForceFunctionAttrsPass on [module] Running pass: ModuleToFunctionPassAdaptor on [module] Running analysis: InnerAnalysisManagerProxy on [module] @@ -1322,7 +1353,7 @@ Prints timing information summaries at the end of each compiler pipeline, with a breakdown of how long each individual pass took. This is useful for understanding compile-time performance issues. -``` +```plain text > CA_LLVM_OPTIONS=-time-passes ===-------------------------------------------------------------------------=== @@ -1352,13 +1383,13 @@ This can also be supported in release mode if the CMake option On Bash or similar shells environment variables can be set as follows: -``` +```sh export CA_OCL_DEBUG_PRINT_KERNELS=1 ``` On Windows console: -``` +```batch SET CA_OCL_DEBUG_PRINT_KERNELS=1 ``` @@ -1380,7 +1411,7 @@ kernel in different files, if the kernel is compiled multiple times. For Linux hosts which support perf hardware events, we can get various metrics by setting the environment variable `CA_ENABLE_PERF_INTERFACE=1` and then -running the executable with perf. Since the kernel is being *JIT'ed*, on Linux +running the executable with perf. Since the kernel is being _JIT'ed_, on Linux hosts. 1. The compiled kernel object will be placed in `/tmp/perf-$\{pid\}.o` @@ -1415,7 +1446,7 @@ CA_ENABLE_PERF_INTERFACE=1 perf record \ After recording the profile, you can view the statistics using perf report -> **Note :** Be aware that if you run *perf report* with the *-a* option to +> **Note :** Be aware that if you run _perf report_ with the _-a_ option to > enable profiling on all the CPUs, all processes running on the OS will be > profiled and percentage calculations will take all of them into account. This > is most likely **Not** what you want. @@ -1506,8 +1537,8 @@ ccache -C > Unfortunately it is not possible to cache all the oneAPI Construction Kit build > steps such as building bitcode for the [builtins](modules/builtins.md) module, -> this is due to `ccache` not being aware of the compiler flags passed to `clang` to -> generate these outputs. +> this is due to `ccache` not being aware of the compiler flags passed to `clang` +> to generate these outputs. ## Enhanced GDB Debugging @@ -1522,7 +1553,7 @@ pretty printers for the types it defines, these can be found in `modules/cargo/scripts/gdb/prettyprinters.py`. To enable them, issue the following command in a [GDB][gdb] session: -``` +```plain text (gdb) source modules/cargo/scripts/gdb/prettyprinters.py ``` @@ -1565,25 +1596,25 @@ tested value. 2) Build the oneAPI Construction Kit and the benchmark you'll be using to benchmark the driver (for example, a benchmark from PerfCL). -3) Clone the Flamegraphs repository: https://github.com/brendangregg/FlameGraph +3) Clone the Flamegraphs repository: 4) cd to your benchmark, then use the `perf` tool to execute your benchmark and record some stack samples: - `perf record -g --call-graph dwarf ./jacobi1D` (if running the `jacobi1D` + `perf record -g --call-graph dwarf ./jacobi1D` (if running the `jacobi1D` benchmark from PerfCL) - Don't forget to ensure the the oneAPI Construction Kit CL driver is being - loaded correctly by the ICD, or you use `OCL_ICD_FILENAMES` to override the - CL driver: + Don't forget to ensure the the oneAPI Construction Kit CL driver is being + loaded correctly by the ICD, or you use `OCL_ICD_FILENAMES` to override the + CL driver: - `export OCL_ICD_FILENAMES=/lib/libCL.so` + `export OCL_ICD_FILENAMES=/lib/libCL.so` - * TIP: `perf script` is extremely slow on Debian and Ubuntu because it relies - on forking to `addr2line` for each stack captured (hundreds of MB's of data). - If you build `perf` from source with `libbfd` installed, it will be up to - 60x faster. See https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement - for more information. + * TIP: `perf script` is extremely slow on Debian and Ubuntu because it relies + on forking to `addr2line` for each stack captured (hundreds of MB's of data). + If you build `perf` from source with `libbfd` installed, it will be up to + 60x faster. See + for more information. 5) Follow the instructions in the FlameGraph repository to generate a nice SVG: