Activate the FFI implementation of SVD on GPU. #24211

copybara-service · 2024-10-09T14:29:59Z

Activate the FFI implementation of SVD on GPU.

Alongside activating this new implementation, this change adds a new algorithm parameter to jax.lax.svd. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized.

This change updates the implementation of SVD in lax to use the FFI version which was added to jaxlib in #23794. This comes with a few benefits:

When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API.
This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support full_matrices=False) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed.

Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now.

Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized. This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in #23794. This comes with a few benefits: 1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in #23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API. 2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed. Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now. PiperOrigin-RevId: 687106965

copybara-service bot assigned dfm Oct 9, 2024

copybara-service bot force-pushed the test_679137575 branch from 4114c74 to 33d5675 Compare October 9, 2024 14:38

copybara-service bot force-pushed the test_679137575 branch 3 times, most recently from e61cfa0 to 32991ce Compare October 18, 2024 00:45

copybara-service bot force-pushed the test_679137575 branch from 32991ce to 8361eb5 Compare October 18, 2024 00:57

copybara-service bot merged commit 8361eb5 into main Oct 18, 2024

copybara-service bot deleted the test_679137575 branch October 18, 2024 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activate the FFI implementation of SVD on GPU. #24211

Activate the FFI implementation of SVD on GPU. #24211

copybara-service bot commented Oct 9, 2024 •

edited

Loading

Activate the FFI implementation of SVD on GPU. #24211

Activate the FFI implementation of SVD on GPU. #24211

Conversation

copybara-service bot commented Oct 9, 2024 • edited Loading

copybara-service bot commented Oct 9, 2024 •

edited

Loading