Skip to content

Accelerated General (FP32) Matrix Multiplication

License

Notifications You must be signed in to change notification settings

tgautam03/xGeMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xGeMM

Accelerated General (FP32) Matrix Multiplication. Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-550 and CUDA 12.4.

Watch the YouTube video (click the image below)

VideoThumbnail

Dependencies

Running Benchmarks

1. Eigen (CPU) matrix multiplication

Compile: make 00a_benchmark_cpu.out

Execute: ./00a_benchmark_cpu.out

2. cuBLAS (GPU) matrix multiplication:

Compile: make 00b_benchmark_cuBLAS.out

Execute: ./00b_benchmark_cuBLAS.out

3. Naive (GPU) matrix multiplication:

Compile: make 01_benchmark_naive.out

Execute: ./01_benchmark_naive.out

4. Coalesced (GPU) matrix multiplication:

Compile: make 02_benchmark_coalesced.out

Execute: ./02_benchmark_coalesced.out

5. Tiled (GPU) matrix multiplication:

Compile: make 03_benchmark_tiled.out

Execute: ./03_benchmark_tiled.out

6. 1D thread coarsening (GPU) matrix multiplication:

Compile: make 04_benchmark_coarse_1d.out

Execute: ./04_benchmark_coarse_1d.out

7. 2D thread coarsening (GPU) matrix multiplication:

Compile: make 05_benchmark_coarse_2d.out

Execute: ./05_benchmark_coarse_2d.out

8. Vectorized Mmemory accesses (GPU) matrix multiplication:

Compile: make 06_benchmark_coarse_2d_vec.out

Execute: ./06_benchmark_coarse_2d_vec.out

About

Accelerated General (FP32) Matrix Multiplication

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published