Errors and crashes with GPU NFFT for larger number of nodes #143

IvoMaatman · 2024-10-14T15:20:54Z

Hi all,

I’m encountering issues while performing the NFFT on CuArrays for large sets of nodes. Below is a 3D NFFT example:

using NFFT
using CUDA

J = 2200000
N = 256 

k = rand(Float32, 3, J) .- Float32(0.5)
f = randn(ComplexF32, J)

x_d = CuArray{ComplexF32}(undef, (N, N, N))
f_d = CuArray(f)

p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)

This example runs successfully for smaller values of J, but I encounter inexact errors for J at the given size and a REPL crash for significantly larger node counts.

I am attempting to perform GPU NFFTs with many more nodes than shown in this example. Does anyone know how to resolve this? Thanks!

The text was updated successfully, but these errors were encountered:

tknopp · 2024-10-14T16:02:11Z

I think I know the cause but not necessary a way to resolve it.

On the CPU we currently only have the full pre computation method. If you look at this paper (https://www.mathematik.uni-osnabrueck.de/fileadmin/mathematik/documents/AG_Analysis/kunis/KuPo08.pdf) one can see in table 3.1 that this method requires m^d * J * sizeof(Float32) bytes of storage. Thus, what you can try is decrease m to 2-3 and look if that gives you sufficient accuracy.

In the long run, we also want to implement the other pre-computation strategies on the GPU but there is no schedule for that so far. You might give https://github.com/jipolanco/NonuniformFFTs.jl a try, which now also Implements AbstractNFFTs. Thus you don't need to change your code but just the using statement and the plan_nfft call.

IvoMaatman · 2024-10-14T18:24:29Z

Thanks for your quick reply and suggestions.

I don't think the issue is strictly related to memory. For example, converting all arrays to double precision with a slightly smaller value for J than mentioned above works fine with our system:

J = 2100000
N = 256 

k = rand(Float64, 3, J) .- Float64(0.5)
f = randn(ComplexF64, J)

x_d = CuArray{ComplexF64}(undef, (N, N, N))
f_d = CuArray(f)

p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)

I am using a device with 80 GB of VRAM and plenty of RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors and crashes with GPU NFFT for larger number of nodes #143

Errors and crashes with GPU NFFT for larger number of nodes #143

IvoMaatman commented Oct 14, 2024

tknopp commented Oct 14, 2024

IvoMaatman commented Oct 14, 2024

Errors and crashes with GPU NFFT for larger number of nodes #143

Errors and crashes with GPU NFFT for larger number of nodes #143

Comments

IvoMaatman commented Oct 14, 2024

tknopp commented Oct 14, 2024

IvoMaatman commented Oct 14, 2024