Matrix-Vector Multiplication Using Shared and Coalesced Memory Access
The goal of this project is to create a fast and efficient matrix-vector multiplication kernel for GPU computing in CUDA C. Refer to vmp.pdf for a detailed paper describing the algorithms and testing suite.