Andrzej WĂłjtowicz
Document generation date: 2016-12-01 12:24:11
This document presents timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs.
- 2016-12-01: results: updated timing for Intel Xeon E3-1275 v5; code: added possible compilation fix for invalid operands error in GotoBLAS2.
- 2016-11-30: results: added Intel Xeon E5-1620 v4.
- 2016-11-29: results: added Intel Xeon E3-1275 v5.
- 2016-11-25: results: added Intel Atom C2758.
- 2016-07-14: results: added Intel Core i5-6500; changed results view of gcbd benchmark to relative performance gain; changed reference CPU (Intel Pentium Dual-Core E5300) and GPU (NVIDIA GeForce GT 630M); code: fixed target architecture detection for Intel Core i5-6500-like CPUs in multi-threaded Atlas library; added info how to force target architecture in GotoBLAS2 and BLIS libraries.
- Configuration
- Results per host
- Intel Xeon E3-1275 v5
- Intel Xeon E5-1620 v4
- Intel Core i7-4790K + MSI GeForce GTX 980 Ti Lightning
- Intel Core i5-4590 + NVIDIA GeForce GT 430
- Intel Core i5-4590 + NVIDIA GeForce GTX 750 Ti
- Intel Core i5-6500
- Intel Core i5-3570
- Intel Core i3-2120
- Intel Core i3-3120M
- Intel Core i5-3317U + NVIDIA GeForce GT 630M
- Intel Atom C2758
- Intel Pentium Dual-Core E5300
- Results per library
OS: Debian Jessie, kernel 4.4
R software: Microsoft R Open (3.2.4)
Libraries:
CPU (single-threaded) | CPU (multi-threaded) | GPU |
---|---|---|
Netlib (debian package, blas 1.2.20110419, lapack 3.5.0) | OpenBLAS (debian package, 0.2.12) | NVIDIA cuBLAS (NVBLAS 6.5 + Intel MKL) |
ATLAS (debian package, 3.10.2) | ATLAS (dev branch, 3.11.38) | |
GotoBLAS2 (Survive fork, 3.141) | ||
Intel MKL (part of RevoMath package, 3.2.4) | ||
BLIS (dev branch, 0.2.0+/17.05.2016) |
Hosts:
No. | CPU | GPU |
---|---|---|
1. | Intel Xeon E3-1275 v5 | - |
2. | Intel Xeon E5-1620 v4 | - |
3. | Intel Core i7-4790K (OC 4.5 GHz) | MSI GeForce GTX 980 Ti Lightning |
4. | Intel Core i5-4590 | NVIDIA GeForce GT 430 |
5. | Intel Core i5-4590 | NVIDIA GeForce GTX 750 Ti |
6. | Intel Core i5-6500 | - |
7. | Intel Core i5-3570 | - |
8. | Intel Core i3-2120 | - |
9. | Intel Core i3-3120M | - |
10. | Intel Core i5-3317U | NVIDIA GeForce GT 630M |
11. | Intel Atom C2758 | - |
12. | Intel Pentium Dual-Core E5300 | - |
Benchmarks: R-benchmark-25, Revolution, Gcbd.
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
ATLAS (mt) crashes in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
ATLAS (mt) crashes in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
ATLAS (mt) crashes in this test
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
BLIS hangs in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Netlib - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Library crashes on Intel Atom C2758 in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Library crashes on Intel Atom C2758 in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Library crashes on Intel Atom C2758 in this test
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Library hangs on Intel Pentium Dual-Core E5300 in this test
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: Intel Pentium Dual-Core E5300 - from 50 to 5 runs - higher is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Time in seconds - 10 runs - lower is better
Performance gain regarding matrix size - reference: NVIDIA GeForce GT 630M - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: NVIDIA GeForce GT 630M - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: NVIDIA GeForce GT 630M - from 50 to 5 runs - higher is better
Performance gain regarding matrix size - reference: NVIDIA GeForce GT 630M - from 50 to 5 runs - higher is better