Skip to content

high performance alternative to glibc's memcmp. Also demonstrating concept of using TMP with intrinsics for optimised assembly code.

License

Notifications You must be signed in to change notification settings

oschonrock/arrcmp

Repository files navigation

arrcmp

high performance alternative to glibc's memcmp. Also demonstrating concept of using TMP with intrinsics for optimised assembly code.

This approach was presented as a lightning talk at C++ On Sea 2022, titled: "C++ your friendly meta assembler - or how to beat memcmp" https://www.youtube.com/watch?v=v6ENImXcsPE

Installing dependencies (for ubuntu)

# Google test and Google Benchmark
sudo apt install libgtest-dev libbenchmark-dev

# Google Benchmark plot
git submodule update --init --recursive

# the benchmark executable links against asmlib for comparison.
# the object file for the gcc/clang/intel 64-bit ELF version of asmlib in included in /ext
# and will be automatically linked against. 

Building

Best results with clang-14 or newer:

cmake -B build -S . -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14

cmake --build build

Tests and benchmarks will be run automatically

run bench/run_and_graph_results.sh to produce CSV results and plot them with google_benchmark_plot/plot.py.

For the impatient the results, are included as CSV and png in bench folder and below...

Sandybridge i7 CPU (SSE2) - glibc 2.35

image

Skylake i7 CPU (AVX2) - glibc 2.35

image

Changes with glibc 2.39

Some improvements were made in glibc, which now means asmlib is not faster anymore, but arrcmp still is:

Ivy Bridge i5 CPU (SSE2) - glibc 2.39 - clang

image

some weird artifacts about 60bytes for gcc

Ivy Bridge i5 CPU (SSE2) - glibc 2.39 - gcc

image

About

high performance alternative to glibc's memcmp. Also demonstrating concept of using TMP with intrinsics for optimised assembly code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published