You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m encountering issues while performing the NFFT on CuArrays for large sets of nodes. Below is a 3D NFFT example:
using NFFT
using CUDA
J = 2200000
N = 256
k = rand(Float32, 3, J) .- Float32(0.5)
f = randn(ComplexF32, J)
x_d = CuArray{ComplexF32}(undef, (N, N, N))
f_d = CuArray(f)
p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)
This example runs successfully for smaller values of J, but I encounter inexact errors for J at the given size and a REPL crash for significantly larger node counts.
I am attempting to perform GPU NFFTs with many more nodes than shown in this example. Does anyone know how to resolve this? Thanks!
The text was updated successfully, but these errors were encountered:
In the long run, we also want to implement the other pre-computation strategies on the GPU but there is no schedule for that so far. You might give https://github.com/jipolanco/NonuniformFFTs.jl a try, which now also Implements AbstractNFFTs. Thus you don't need to change your code but just the using statement and the plan_nfft call.
I don't think the issue is strictly related to memory. For example, converting all arrays to double precision with a slightly smaller value for J than mentioned above works fine with our system:
J = 2100000
N = 256
k = rand(Float64, 3, J) .- Float64(0.5)
f = randn(ComplexF64, J)
x_d = CuArray{ComplexF64}(undef, (N, N, N))
f_d = CuArray(f)
p = plan_nfft(CuArray, k, (N, N, N))
mul!(x_d, adjoint(p), f_d)
I am using a device with 80 GB of VRAM and plenty of RAM.
Hi all,
I’m encountering issues while performing the NFFT on CuArrays for large sets of nodes. Below is a 3D NFFT example:
This example runs successfully for smaller values of J, but I encounter inexact errors for J at the given size and a REPL crash for significantly larger node counts.
I am attempting to perform GPU NFFTs with many more nodes than shown in this example. Does anyone know how to resolve this? Thanks!
The text was updated successfully, but these errors were encountered: