-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement batched serial getrf #2331
implement batched serial getrf #2331
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
689fba1
to
2ebf151
Compare
f74e10d
to
d7e1eee
Compare
d7e1eee
to
bb53361
Compare
#include <KokkosBatched_Gemm_Decl.hpp> | ||
#include "Test_Batched_DenseUtils.hpp" | ||
|
||
using namespace KokkosBatched; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these headers get included into a big test file, I'd rather not pollute the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have fixed accordingly.
bb53361
to
a4424cd
Compare
@cwpearson Thank you for the review. I have updated based on your review. Further reviews are highly appreciated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few points to look at but good overall, will run testing on this
// Use recursive code | ||
auto n1 = Kokkos::min(m, n) / 2; | ||
|
||
// [ A00 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity it should be A0
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. Here, A0 is equivalent to
[ A00 ]
[ --- ]
[ A10 ]
I have added the concrete explanation in the docstrings of KokkosBatched_Getrf.hpp
. For consistency, I changed the matrix notation in a pythonic manner, i.e.,
A0 = [[A00],
[A10]]
The original is coming from the lapack notation
https://www.netlib.org/lapack/explore-html-3.6.1/dd/d9a/group__double_g_ecomputational_gabdd3af29e9f6bbaf4b352341a1e8b464.html#gabdd3af29e9f6bbaf4b352341a1e8b464
|
||
if (info == 0 && iinfo > 0) info = iinfo; | ||
|
||
// [ A01 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, the comment seems wrong since you have not extracted the subviews for A01
and A11
yet
TEST_F(TestCategory, test_batched_getrf_float) { | ||
using algo_tag_type = typename KokkosBatched::Algo::Getrf::Unblocked; | ||
|
||
test_batched_getrf<TestDevice, float, algo_tag_type>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it would be nice to have a couple analytical tests for this. A simple way to create an analytical test is to fill both L and U, then multiply them and eventually figure out the pivots. That would give a good level of extra confidence in the work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
I added a little more complicated analytical test where we know all of the matrices that satisfy
PA = LU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once @lucbv is happy this is fine with me. Thanks!
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
…l.hpp Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
a4424cd
to
65d6fb4
Compare
This PR implements getrf function.
Following files are added:
KokkosBatched_Getrf_Serial_Impl.hpp
: Internal interfaces with implementation detailsKokkosBatched_Getrf.hpp
: APIsTest_Batched_SerialGetrf.hpp
: Unit tests for thatDetailed description
It computes an LU factorization of a real general M-by-N matrix
A
using partial pivoting with row interchanges.Here, the matrix has the following shape.
A
:(batch_count, m, n)
On entry, the M-by-N matrix to be factored. On exit, the factors
L
andU
from the factorizationA = P * L * U
; the unit diagonal elements ofL
are not stored.IPIV
:(batch_count, min(m, n))
The pivot indices; for
0 <= i < min(M,N)
, rowi
of the matrix was interchanged with rowIPIV(i)
.Parallelization would be made in the following manner. This is efficient only when
A is given in
LayoutLeft
for GPUs andLayoutRight
for CPUs (parallelized over batch direction).Tests
A
and factorize it intoLU
withipiv
. ReconstructL
andU
fromLU
. Then permuteA
byipiv
and confirmLU
==A
.A
as follows to confirmLU
==A
.Important remarks (edited 30/Oct)
Laswp
andIamax
. These are also used forgetrs
andgbtrf
.4096 x 4096 (=2^12)
due to the limit of the stack size. In order to avoid this limit, we need to ask user to prepare a temporal buffer to achieve a stack.Currently, we call CPU version if View is accessible from host. This does not work appropriately if we execute the function on GPUs with USM.We call recursive version on Host and stuck version on Device by usingKOKKOS_IF_ON_HOST
andKOKKOS_IF_ON_DEVICE
.trsm
andgemm
first.gesv
withdynamic_pivoting
(getrf
+getrs
) shows comparable or better performance for most matrix sizes compared togesv
withstatic_pivoting
on NVIDIA and AMD GPUs