Skip to content

cianciosa/kronmult

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Continuous Integration Test

Code to perform 6-dimensional batched kronecker product on GPU and CPU (using OpenMP).

Y(:,k) += kron(A1(:,:,k), ..., A6(:,:,k) ) * X(:,k), where k=1:batchCount

The code assumes each matrix A1, ..., A6 are all square and same shape of n by n.

Note there can be overlap in the output vectors Y(:,k). Atomic add updates will be performed.

Each vector X(:,k) or Y(:,k) is conceptually of shape n^6 by 1 but during the computation can be reshaped as n^5 by n or n^4 by n^2 or n^3 by n^3 or n^2 by n^4 or n by n^5.

Note that
Y = kron(A1,...,A6) * X can be evaluated as step 1: W = reshape( X, [n^5,n]) * transpose(A1) step 2: Y = kron( A2, ..., A6) * W, which can be viewed as "tail recursion"

At the lowest level of recursion Y = kron(A1) * X is simply implemented as matrix multiply Y = A1 * X

This implementation evaluates each kronecker product in a separate thread block on GPU. Instead of building a long batch list to call batched GEMM, the matrix-matrix multiplication is evaluated as calls to device functions

kgemm_nn() to evaluate C = alpha * A * B + beta * C or kgemm_nt() to evaluate C = alpha * A * transpose(B) + beta * C

For the special case of (beta == 1), atomicAdd update is used.

Note that the GEMM operations will be performed on very slender rectangular matrices. Therefore, the computations will not be dominated by floating point operations but by data movement, especially when n is small.


To compile the code for CPU (1) mkdir build && cd build (2) cmake ../ (3) make

To compile the code for Nvidia GPU (1) mkdir build && cd build (2) cmake ../ -DUSE_GPU=1 (3) make

To run the tester for kgemm_nn_batched, perform ./test_kgemm_nn_batched

To run the tester for kgemm_nt_batched, perform ./test_kgemm_nt_batched

To run the tester for kronmult6_batched, perform ./test_kronmult6_batched

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 95.4%
  • CMake 4.6%