Porting loop of dlarfx calls to dlarfb #983

JanLuca · 2024-01-24T13:27:18Z

JanLuca
Jan 24, 2024

At the moment I try to refactor a code which consists of a loop of dlarfx calls to a single dlarft/dlarfb call to improve performance. The code transform a matrix A which is part of a product P * A * Q^T and the dlarft/dlarfb calls intend to modify the P and Q matrix so that the product is still valid.

My problem is that the refactored code results into completely different matrices. My questions is if there are known subtle differences or caveats of the dlarft/dlarfb code in comparison to a loop of dlarfx applications? Or do I have a misunderstand of the dlarft/dlarfb interface from the documentation and call the functions the wrong way.

Below I attach the relevant except of both versions of the code. It is assumed that the input matrix A is square with dimension (N, N). The code is written in C and uses the LAPACKE interface. Since the matrix is processed in backwards order, the Householder vector is of from (v, v, 1, 0).

Loop version

for (int i = N - 1; i >= 0; i--) {

  // Calculation of Householder vector and tau, omitted

  LAPACKE_dlarfx_work(LAPACK_COL_MAJOR,
                      'R',
                      nrowsq,   // Number of rows of Q
                      N,
                      vec_array,// Householder vector
                      tau,
                      q,        // Stored as Q => application from right
                      ldq,
                      work);
}

Blocked version

a is constructed as matrix with elements

(  1,   1, vp1, vp2)
(vq1,   1,   1, vp2)
(vq2, vq2,   1,   1)
(vq3, vq3, vq3,   1)

where vqX are the vectors for application to Q.

LAPACKE_dlarft_work(LAPACK_COL_MAJOR,
                    'B',
                    'R',
                    N,
                    N,
                    a,
                    lda,
                    tauq_array,
                    t_matrix,
                    N);

LAPACKE_dlarfb_work(LAPACK_COL_MAJOR,
                    'R',
                    'N',
                    'B',
                    'R',
                    nrowsq,
                    N,
                    N,
                    a,
                    lda,
                    t_matrix,
                    N,
                    q,
                    ldq,
                    work,
                    nrowsq);

For any hint why the blocked version results into different results I would be very thankful. If you need another information to triangulate the error, I am happy to provide more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Porting loop of dlarfx calls to dlarfb #983

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Porting loop of dlarfx calls to dlarfb #983

Uh oh!

JanLuca Jan 24, 2024

Replies: 0 comments

JanLuca
Jan 24, 2024