From 72b9d7a3d6c6af053a5543411630f517ed22cd3b Mon Sep 17 00:00:00 2001 From: Geolm Date: Fri, 19 Jan 2024 12:30:17 -0500 Subject: [PATCH 1/4] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7c934bd..9c64744 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ One header file library that implement of missing transcendental math functions [![Build Status](https://github.com/geolm/math_intrinsics/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/geolm/math_intrinsics/actions) # why? -AVX and Neon intrinsics don't provide transcendental math functions. Of course there are already some libraries with those functions but there are usually not free, restricted to one tpe of hardware or with low precision. This library is super easy to integrate, with a precision close to the C math library (see below) and with MIT license. +AVX and Neon intrinsics don't provide transcendental math functions. Of course there are already some libraries with those functions but there are usually not free, restricted to one specific hardware or with low precision. This library is super easy to integrate, with a precision close to the C math library (see below) and with MIT license. # how to @@ -96,3 +96,7 @@ float32x4_t vcbrtq_f32(float32x4_t a); [speeding up atan2f by 50x](https://mazzo.li/posts/vectorized-atan2.html) +# FAQ + + + From 8a150d29a506871408c963627e224e96f7ced85f Mon Sep 17 00:00:00 2001 From: Geolm Date: Fri, 19 Jan 2024 13:49:06 -0500 Subject: [PATCH 2/4] Update README.md --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 9c64744..df7ee08 100644 --- a/README.md +++ b/README.md @@ -98,5 +98,19 @@ float32x4_t vcbrtq_f32(float32x4_t a); # FAQ +## is it fast? +The goal of this library is to provide math function with a good precision with every computation done in AVX/NEON. Performance is not the focus. +Here's the benchmark results on my old Intel Core i7 from 2018 (time for 32 billions of computed value) +* mm256_sin_ps : 29887ms +* mm256_acos_ps : 24650ms +* mm256_exp_ps : 24387ms + +# I'd like to trade some precision for performances + +You can look at some approximation in my [simd](https://github.com/Geolm/simd/blob/main/simd_approx_math.h). + +# Why AVX2 ? + +On multiple functions this library use a float as an int to have access to the mantissa and the exponent part. While it's doable with AVX1 using SSE4.2, I don't see the point of not using AVX2 which have been on intel CPU since 2013. From 63c6e45ce5e195c940edbb07744f94f64bfacd62 Mon Sep 17 00:00:00 2001 From: Geolm Date: Fri, 19 Jan 2024 13:50:55 -0500 Subject: [PATCH 3/4] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index df7ee08..efe7d41 100644 --- a/README.md +++ b/README.md @@ -101,14 +101,14 @@ float32x4_t vcbrtq_f32(float32x4_t a); ## is it fast? The goal of this library is to provide math function with a good precision with every computation done in AVX/NEON. Performance is not the focus. -Here's the benchmark results on my old Intel Core i7 from 2018 (time for 32 billions of computed value) +Here's the benchmark results on my old Intel Core i7 from 2018 (time for 32 billions of computed values) * mm256_sin_ps : 29887ms * mm256_acos_ps : 24650ms * mm256_exp_ps : 24387ms # I'd like to trade some precision for performances -You can look at some approximation in my [simd](https://github.com/Geolm/simd/blob/main/simd_approx_math.h). +You can look at some approximations in my [simd](https://github.com/Geolm/simd/blob/main/simd_approx_math.h) repo. It's not copy/paste friendly but you get the idea, also you can get the whole repo which contains only few files. # Why AVX2 ? From 2b6a707c3d6738b702d0dc816e3d8bad9cf94ee1 Mon Sep 17 00:00:00 2001 From: Geolm Date: Fri, 19 Jan 2024 13:52:47 -0500 Subject: [PATCH 4/4] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index efe7d41..0a31d61 100644 --- a/README.md +++ b/README.md @@ -106,11 +106,11 @@ Here's the benchmark results on my old Intel Core i7 from 2018 (time for 32 bill * mm256_acos_ps : 24650ms * mm256_exp_ps : 24387ms -# I'd like to trade some precision for performances +## is there a faster version with less precision? You can look at some approximations in my [simd](https://github.com/Geolm/simd/blob/main/simd_approx_math.h) repo. It's not copy/paste friendly but you get the idea, also you can get the whole repo which contains only few files. -# Why AVX2 ? +## why AVX2 ? On multiple functions this library use a float as an int to have access to the mantissa and the exponent part. While it's doable with AVX1 using SSE4.2, I don't see the point of not using AVX2 which have been on intel CPU since 2013.