BenchPiCalc

Goal

Benchmark various implementations of calculating pi, focusing on threading, vectorization, and OpenMP. The solutions are inspired and extended from Introduction to OpenMP - Tim Mattson (Intel), which introduces the problem as a great candidate for parallelization that exhibits easily observable performance benefits. The series focuses on solutions that leverage OpenMP, but the problem is also easily extendable to leverage other techniques as well. The focus is not to calculate pi as quickly as possible (you'd be better off just using std::numbers::pi, after all) but to explore performance implications of various configurations and solutions.

General Solution

All implementations solve the integration of 4/(1 + x^2) from 0 to 1 using a Riemann sum with various numbers of partitions.

Implementation Notes

SerialPi: Basic serial solution without anything fancy
SSE2Pi: Serial solution using SSE2 vector instructions
AvxPi: Serial solution using AVX vector instructions
Avx512Pi: Serial solution using AVX512 vector instructions
NaiveOmpPi: Parallel solution using basic OpenMP constructs
FalseSharingOmpPi: Parallel solution using basic OpenMP constructs, but causing a false sharing issue
AtomicOmpPi: Parallel solution using basic OpenMP constructs, including an atomic for sum increment
ForOmpPi: Parallel solution using an OpenMP parallel for loop with a reduction clause for sum
ThreadPi: Parallel solution manually using std::thread
SimdThreadPi: Parallel solution manually using std::thread and AVX512 vector instructions
KomputePi: General Purpose GPU (GPGPU) solution using Vulkan compute shaders dispatched using kompute

Results Summary

4026531839 Iterations on an AMD Ryzen 7 7700X (CPU tests) or a Radeon RX 6900 XT (GPU tests)

Solution	threads	-march=native	-ffast-math	time (ms)
SerialPi	1	yes	yes	831.513
SerialPi	1	yes	no	2213.48
SerialPi	1	no	*	3434.23
SSE2Pi	1	*	*	1641.54
AvxPi	1	*	*	820.518
Avx512Pi	1	*	*	827.005
NaiveOmpPi	16	no	*	439.549
FalseSharingOmpPi	16	no	*	6803.25
AtomicOmpPi	16	no	*	446.516
ForOmpPi	16	no	*	447.340
ForOmpPi	16	yes	no	154.168
ForOmpPi	16	yes	yes	103.906
ThreadPi	16	yes	*	413.089
SimdThreadPi	16	*	*	104.933
KomputePi	GPU	*	*	159.680

Building

The project is setup with CMake using FetchContent for a few packages.

Catch2 is used for testing and benchmarking.
kompute is used for running GPU workloads with Vulkan compute shaders.

It also requires manually downloading and installing:

The Vulkan SDK for kompute support

Presets are created for g++ as that is the compiler everything is currently test with. The "linux-gcc-profile" preset is meant to be just as optimized as the "release" preset but with debug information, however, on some machines (Intel CPUs?) this causes some performance issues.

$ cmake --list-presets
Available configure presets:

  "linux-gcc-debug"   - Linux GCC Debug
  "linux-gcc-release" - Linux GCC Release
  "linux-gcc-profile" - Linux GCC Release with debug info

The project can be configured with:

$ cmake --preset linux-gcc-profile

The project can then be built with:

$ cmake --build out/build/linux-gcc-profile/

And then tested/run with:

$ ctest --test-dir out/build/linux-gcc-profile/
$ ./out/build/linux-gcc-profile/PiBench/PiBench [common Catch2 args to control what to run]

In particular the following arguments are a good starting point for running benchmarks, as they are skipped by default:

$ ./out/build/linux-gcc-profile/PiBench/PiBench --benchmark-no-analysis --benchmark-samples 1 "[!benchmark]"
Filters: [!benchmark]
Randomness seeded to: 2212001003

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PiBench is a Catch2 v3.6.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
Pi calculation benchmark
-------------------------------------------------------------------------------
/home/keifer/source/repos/BenchPiCalc/PiBench/PiBench.cpp:15
...............................................................................

benchmark name                            samples    iterations          mean
-------------------------------------------------------------------------------
SerialPi                                         1             1     3.43423 s

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
PiBench		PiBench
PiLib		PiLib
cmake		cmake
notes		notes
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchPiCalc

Goal

General Solution

Implementation Notes

Results Summary

Building

About

Releases

Packages

Languages

License

KeithJH/BenchPiCalc

Folders and files

Latest commit

History

Repository files navigation

BenchPiCalc

Goal

General Solution

Implementation Notes

Results Summary

Building

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages