Efficient SDDMM Algorithms on GPU, a dynamic approach

Introduction

The Sampled Dense-Dense Matrix Multiplication (SDDMM) represents a foundational operation crucial for numerous significant machine learning factor analysis algorithms. Among these algorithms are Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gama Poisson. In this repository, we present both our code and the comprehensive findings detailed in our final report. Our focus lies on the development of GPU-Dynamic, an efficient GPU-based implementation of the SDDMM kernel. Our solution boasts remarkable performance enhancements, surpassing current implementations found in Torch with notable speedups of up to 100x. Furthermore, our implementation delivers competitive outcomes when compared to DGL.

Dataset

At the bottom of this README is a representation of all matrices that we have used for evaluation. The matrices range different dimensions as well as different densities. All matrices originate from the SuiteSparse Matrix Collection and can be downloaded by executing the install_matrices.sh script.

How to run the code?

To run the code, you need to install LibTorch for C++ which cou can download from here. We recommend using PyTorch >= 2.1.0 and CUDA >= 12.1. In addition you should have gcc >= 10.2.0 and cmake >= 3.21 installed.

Make sure to update the run_cmake.sh file by updating the path to your libtorch library. You can finally compile and run the code by executing

./run_cmake.sh
./build/src/dphpc --K 32 --data_folder data/

Small matrices

Matrix	Rows	Cols	Non-Zero	Density
Fluid	656	656	18,964	4.4%
Oil	66	66	4,356	100%
Biochemical	1,922	1,922	4,335	0.1%
Circuit	1,220	1,220	5,860	0.39%
Heat	1,794	1,794	7,764	0.24%
Mass	420	420	7,860	4.45%
Adder	1,813	1,813	11,246	0.34%
Trackball	25,503	25,503	15,525	0.01%

Dense matrices

Matrix	Rows	Cols	Non-Zero	Density
Human Gene 2	14,340	14,340	18,068,388	8.8%
ND12k	36,000	36,000	14,220,946	1%
Mix	29,957	29,957	1,990,919	0.22%
Mecanics	29,067	29,067	2,081,063	0.24%
Power	8,140	8,140	2,012,833	3.03%
Combinatorics	4,562	5,761	2,462,970	9.37%
Stress	25,710	25,710	3,749,582	0.56%
Mouse	45,101	45,101	28,967,291	1.42%

Sparse Matrices

Matrix	Rows	Cols	Non-Zero	Density
Email enron	36,692	36,692	367,662	0.027%
Boeing	52,329	52,329	2,600,295	0.09%
Boeing Diagonal	217,918	217,918	11,524,432	0.02%
Stiffness	503,712	503,712	36,816,170	0.014%
Semi conductor	1,090,664	1,090,664	34,767,207	0.0029%
VLSI	1,453,908	1,453,908	37,475,646	0.0017%
Stack overflow	2,601,977	2,601,977	36,233,450	0.00053%
Chip	2,987,012	2,987,012	26,621,983	0.00029%

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
images		images
plots		plots
python		python
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
install_matrices.sh		install_matrices.sh
run_benchmark.sh		run_benchmark.sh
run_cmake.sh		run_cmake.sh
sddmm-report.pdf		sddmm-report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient SDDMM Algorithms on GPU, a dynamic approach

Introduction

Dataset

How to run the code?

Small matrices

Dense matrices

Sparse Matrices

About

Releases

Packages

Contributors 5

Languages

francois141/dphpc

Folders and files

Latest commit

History

Repository files navigation

Efficient SDDMM Algorithms on GPU, a dynamic approach

Introduction

Dataset

How to run the code?

Small matrices

Dense matrices

Sparse Matrices

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages