A simple and repeatable benchmark for validating the GPU performance based on cublas matrix multiplication.
Make sure your CUDA tool kit is setup (Your nvcc
is on $PATH
, shared libraries on $LD_LIBRARY_PATH
, headers on $CPATH
). Then execute the following command to start the test:
$ ./run.sh
-
The code does
C=alpha*A*B+beta*C
with square matrices A, B and C and repeate 2 times (adjustable to test longer for more stable result). -
The sizes of A,B and C are upto (16384,16384) in default test (also adjustable to fit your GPU memory size).
-
The default code runs benchmark for GeForce GTX TITAN BLACK (sm_35) (adjustable) to test with cublasSgemm (can also be cublasHgemm for Pascal GPUs).
Uncomment line 11 in gemm.cu
and line 4 in run.sh
to test float16 matrix multiplication (cublasHgemm) on Tesla P100 GPU. This needs CUDA 8.0.
An example testing result can be found in here.
The "pstate" ranges from P0 to P12 where P0 is the maximum performance and P12 is the minimum performance.