Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors and crashes with GPU NFFT for larger number of nodes #143

Open
IvoMaatman opened this issue Oct 14, 2024 · 2 comments
Open

Errors and crashes with GPU NFFT for larger number of nodes #143

IvoMaatman opened this issue Oct 14, 2024 · 2 comments

Comments

@IvoMaatman
Copy link

Hi all,

I’m encountering issues while performing the NFFT on CuArrays for large sets of nodes. Below is a 3D NFFT example:

using NFFT
using CUDA

J = 2200000
N = 256 

k = rand(Float32, 3, J) .- Float32(0.5)
f = randn(ComplexF32, J)

x_d = CuArray{ComplexF32}(undef, (N, N, N))
f_d = CuArray(f)

p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)

This example runs successfully for smaller values of J, but I encounter inexact errors for J at the given size and a REPL crash for significantly larger node counts.

I am attempting to perform GPU NFFTs with many more nodes than shown in this example. Does anyone know how to resolve this? Thanks!

@tknopp
Copy link
Member

tknopp commented Oct 14, 2024

I think I know the cause but not necessary a way to resolve it.

On the CPU we currently only have the full pre computation method. If you look at this paper (https://www.mathematik.uni-osnabrueck.de/fileadmin/mathematik/documents/AG_Analysis/kunis/KuPo08.pdf) one can see in table 3.1 that this method requires m^d * J * sizeof(Float32) bytes of storage. Thus, what you can try is decrease m to 2-3 and look if that gives you sufficient accuracy.

In the long run, we also want to implement the other pre-computation strategies on the GPU but there is no schedule for that so far. You might give https://github.com/jipolanco/NonuniformFFTs.jl a try, which now also Implements AbstractNFFTs. Thus you don't need to change your code but just the using statement and the plan_nfft call.

@IvoMaatman
Copy link
Author

Thanks for your quick reply and suggestions.

I don't think the issue is strictly related to memory. For example, converting all arrays to double precision with a slightly smaller value for J than mentioned above works fine with our system:

J = 2100000
N = 256 

k = rand(Float64, 3, J) .- Float64(0.5)
f = randn(ComplexF64, J)

x_d = CuArray{ComplexF64}(undef, (N, N, N))
f_d = CuArray(f)

p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)

I am using a device with 80 GB of VRAM and plenty of RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants