Add GPU support to MRIReco.jl #182

nHackel · 2024-04-17T15:11:12Z

This PR adds GPU support to MRIReco.jl and the relevant subpackages. It is based on the following PRs: LinearOperators.jl#321, RegularizedLeastSquares.jl#81 and LinearOperatorCollection.jl#9

Likewise to the other PRs, I will try to keep these changes GPU-agnostic and document my high-level idea and relevant changes in the PR description.

To work on the GPU, we need all LinearOperators to have a GPU storage type. I've started adding the relevant S kwarg to the operators in the collection package and in MRIOperators.jl. Additionally, we need our solver input and dense arrays to also live on the GPU if possible. To allow for this I have added a new :arrayType (name might change) parameter to the parameter Dict, from which on can then construct correct arrays and vectors.

I'll first to keep all changes close to the Reco and operator packages. That in turn means that we need to convert the acquisition data everytime during reconstruction. An alternative could be to specifc the backing array type during data loading and derive it from there

nHackel · 2024-04-17T15:12:42Z

And here is the first GPU reconstruction done with the changes in commit 8ecdc66. The left column shows the phantom, the middle column the CPU result and the right column the GPU result.

aTrotier · 2024-04-18T07:33:09Z

Do you know a way to tests the GPU reconstruction during the CI ?

I think @cncastillo is also trying to test KomaMRI with a different CI setup for the GPU part

nHackel · 2024-04-18T07:57:34Z

There are JLArrays, which are a reference implementation of GPU arrays that run on the CPU. If the code remains GPU agnostic, we can have one test-suite running on "normal" arrays and then again on JLArrays. This is the approach we aim for in RegularizedLeastSquares at the moment.

One issue with that approach is that we need to make all the operators work with JLArrays. In particular, we just call plan_fft and plan_nfft with a given array type and I don't think either work for JLArrays. But I can implement a workaround for that in LinearOperatorCollection. I had to do a similar workaround for the WaveletOp, since Wavelets don't work on the GPU

I also saw this in the Julia #gpu Slack: https://github.com/JuliaGPU/buildkite. As far as I understood it registered Julia package that belong to an organization can apply to run on their CI.

cncastillo · 2024-04-18T13:14:37Z

I also saw this in the Julia #gpu Slack: JuliaGPU/buildkite. As far as I understood it registered Julia package that belong to an organization can apply to run on their CI.

Hi! Indeed, to use JuliaGPU BuildKite CI, your package should be a part of a Julia organization. We recently moved KomaMRI to JuliaHealth for this, and besides some broken links and a few minor issues (JuliaHealth/KomaMRI.jl#335), it was straightforward.

I think @cncastillo is also trying to test KomaMRI with a different CI setup for the GPU part

We are planning on testing Koma on multiple GPUs anyway, so we will do some tests like the ones you require (here https://github.com/JuliaHealth/KomaMRI.jl/blob/master/test/runtests.jl). As we use MRIReco, we are happy to do the tests for you, but we haven't set up the Builkite CI config yet 😄. In the meantime, you can create a PR in Koma pointing to the correct version of MRIReco (this PR), adding CUDA, AMD, Metal, etc to the test environment, modify Buildkite's config (pipeline.yml) to run the tests you want. I can make you a collaborator if you need it.

Having said that, Koma's current integration with MRIReco uses RawAquisitionData as an intermediate. The MRD format inspired this custom type, so the data is forced to a specific type (which makes sense for writing to MRD). It would be nice if the GPUArrays could flow freely from the simulations to the recon; I think the main culprits are the traj and data properties of the Profile type; maybe those could be relaxed.

mutable struct Profile
  head::AcquisitionHeader
  traj::Array{Float32,2}            #<- allow AbstractArray?, but then force to F32 when writting
  data::Array{Complex{Float32},2}   #<- allow AbstractArray?, but then force to CF32 when writting
end

Something similar for AcquisitionData.

Cheers!

nHackel · 2024-04-19T15:46:54Z

Getting access to BuildKite over KomaMRI.jl sounds a like great idea! I'll look into that once everything is setup here. I'll still probably setup the JLArray test case for the local tests.

I saw that LinearOperators.jl had a CI configuration where a maintainer could trigger breakage tests with upstream packages. Something like that would be a good fit for your idea @cncastillo. Then we can test it for the commits where it really matters.

I've also thought about extending my changes to the base, data and file handling packages, but for this PR I'll stick close to a refactoring of the Reco and operator packages with small changes in the other ones if necessary. For now I just want to get all the MRIReco tests and the MRIReco benchmark running on a GPU.

Once we have that running we have a good idea of what structures and information we need to run things on the GPU and then we can look into more PRs that potentially do GPU work or just more generic array compatibility in the simulation and "base" packages

…loops for dense vectors

nHackel · 2024-04-22T12:45:29Z

Closes #168

nHackel · 2024-07-31T09:03:10Z

This PR is now ready to be merged. It adds GPU support to MRIReco.jl and MRIOperators.jl. It also required smaller fixes/dependency updates to MRIBase.jl and MRISimulations.jl.

To run a reconstruction on the GPU a user needs to supply the arrayType parameter. This does not need to be a concrete GPU-array type such as CuArray or ROCArray.
It can also be something callable foo that implements the following calls:

arr = zeros(ComplexF32, 32, 32)

# 1. Convert existing array
arr_gpu = foo(arr) 

# 2. Create vector with given eltype and undef init.
v = foo{ComplexF32}(undef, 0)

"Normal" arrays also fulfil this interface. We need the first function to move things to the GPU and the second one we need to make sure that our operators have the correct storage type.

When we execute a reconstruction on the GPU we need to slightly change the reconstruction and its parameters. While at least for CUDA it seems like each tasks gets its own CUDA stream/context, we generally want to disable the multi-threaded reco loops.

This is possible for @floops by setting an executor, which we derive from the given arrayType:

executor(foo)

For AbstractGPUArrays, this function returns a sequential execution. For "custom" callable objects this would need to be implemented. Since we execute the loop sequentially, we don't need to
copy our operators, but instead can reuse them. This is again done with a new function called copyOpsFn. For GPU arrays this returns the identity.

One last change based on GPU arrays are the parameters for the FFT. Generally, GPU plan_fft calls don't accept the same parameters (or any) as the CPU version. This is again handled by a function MRIOperators.fftParams.

I've also added GPU-support to NFFT, which is implemented as a weak package extension. That should mean that AMD cards can also use the NFFT operator. I currently don't have access to one to test this and I can't use buildkite to fully test this until this PR . Once this is merged we can comment out the buildkite steps I prepared.

As @cncastillo mentioned, the next steps could be to include GPU support into the acquisition data and generally the other packages. One could derive the arrayType then from the data itself

…RIReco.jl into nh/gpuSupport

cncastillo · 2024-07-31T14:27:45Z

This is great news :).

The current way of choosing the backend is a little weird, is this a limitation of LinearOperators.jl?

In case it is useful, @rkierulf and I worked on some functions, GPU/cpu, that automatically call the correct backend using package extensions, and you can choose the device in multi-GPU systems. So this works

using KomaMRICore
using CUDA                   # Loads KomaMRICoreCUDAExt, and set default backend to CUDA
KomaMRICore.set_device!(1)   # Not needed if you want to run it on the first (maybe only) CUDA device
x = zeros(10) |> f32 |> gpu  # CuArray{Float32}, `f64` is also available but not all the backends support it
y = myfun(x)  |> cpu         # Array{Float32}

All the logic for these is very lightweight, and they do not have anything necessarily "Koma"-related, so they could be moved to MRIBase or a new package. The only thing required to convert a new custom type would be to tag it with @functor MyCustomType (this can manage nested structs, as it was based on Flux's adapters). This is everything I think:

Also, instead of using copyOpsFn, you may want to @view the array/operator, which already does what you described for GPU arrays, and it does not copy the array in the CPU but points to it, resulting in fewer memory allocations and potentially faster code. But I may be missing something, as I am not a LinearOperators.jl expert.

nHackel · 2024-07-31T15:27:24Z

Hmm choosing the device and backend like that would be cool. And when we do that for the acq. data structs we could properly use adapt. On the reco level we kinda only have the dictionary.

Internally, we would still need to derive this callable function/arraytype in some fashion for the operators to work.
We need a concrete fully parameterized vector datatype for multiplication with our matrix-free operators. This is the S keyword essentially. This is what LinearOperators.jl constructs for us as a result if we do op * x, see here.

And we need the first variant to move our arrays to the gpu, however I just remembered I haven't done a pass over my code with Adapt.jl, so I'll do that before it is merge ready.

We can't use a concrete datatype with adapt, the function wants adapt(CuArray, ...) instead of adapt(CuArray{Float32, 1, CUDA.DeviceMemory}, ...). But we can strip a given array (so the typeof(acqData.foo)) to go to this base arraytype. See also here

nHackel · 2024-07-31T15:28:26Z

Ah we can't use a view because for CPU operations we actually want to get copies of the operators because they are stateful and not thread-safe

cncastillo · 2024-07-31T17:29:36Z

Internally, we would still need to derive this callable function/arraytype in some fashion for the operators to work.

At least using KernelAbstractions.jl, there are some functions like get_backend(array) that does this. I think GPUArrays.jl will be moved to use KA.jl JuliaGPU/GPUArrays.jl#525, but until then, there seems to be an equivalent backend(array) in GPUArrays.jl.

nHackel · 2024-07-31T18:55:51Z

We can derive that from a dense GPU array of the acquisition data using Base, so that's covered once we we have that.

MRIReco can also add KernelAbstractions as a dependency or your backend-selection-solution.

In MPIReco.jl I moved to KernelAbstractions.jl because I needed to use shared memory which I couldnt get to work with GPUArrays.jl. We could also port the kernels here, I just didnt have the time/it wasnt necessary so far

nHackel added 5 commits April 17, 2024 10:44

Update RegLS calls

72641ee

Add S kwarg to SparseOp

b0af5eb

Fix NFFTOp call, unsure if trajectory is passed on correctly

ed506cb

Add :arrayType to param dict

5557aaa

Init working GPU reconstruction_simple

8ecdc66

nHackel added 3 commits April 18, 2024 09:24

Remove CompositeOp

a3a5202

Fix NFFTOp nodes retrieval from Trajectory

bc1cf80

Fix SparseOp

7f88b81

Fix missing type conversion for default Cartesian3D trajectory

4cfc217

nHackel added 3 commits April 18, 2024 16:51

Remove CompOp include

fd86c4e

Init ExplicitOp for GPU

ef307cf

ctprod for ExplicitOp on GPU

7cc8cd1

nHackel added 5 commits April 22, 2024 11:23

Add GPU support to SubspaceOp

1070052

Add S kwarg to MapSliceOp

6da90c9

Add GPU support to circulaShutter!

4669dc9

Migrate DiagOp to LinearOpCollection, "overwrite" products with @Floop …

4ed271c

…loops for dense vectors

Add S kwarg to EncodingOps

da79555

nHackel added 7 commits April 23, 2024 17:52

Chose @floop executor based on arraytype

a29fd35

Add GPU support for SensitivityOp

23f9d44

Fix missing commas in EncodingOps

c399fe0

Adapt to function name change for DiagOp

44a3d08

Add operatore copy function as kwarg

c98b0b3

Pass along FFTParams to operators based on array type

8daf6c1

Fix GPU FFT params

de51024

nHackel added 13 commits July 3, 2024 10:03

Add CUDA buildkite CI

b9b39cf

Add AMD tests (inactive until NFFT runs on AMD)

55b813b

Update RegLS and LinearOperatorColl. compat

5d54fb1

Readd JLArray tests to MRIOperators

01185c0

Add JLArray test to MRIReco

e9e2039

Attempt to run buildkite for MRIOperators

348fcba

Merge branch 'master' into nh/gpuSupport

0499df3

prepare project.toml for releases

beea135

Fix MRIOperators dep in MRISimulation

747730f

Try fix CUDA buildkite

65c25ab

Try to fix CI by dev'ing all packages in one call

9c7e769

Remove CuNFFT from CUDA test

d81ded7

Update CUDA and ROCM test files for GPU support

a06cad3

nHackel marked this pull request as ready for review July 31, 2024 09:03

nHackel added 3 commits July 31, 2024 11:05

Merge branch 'master' into nh/gpuSupport

32ff016

Merge branch 'nh/gpuSupport' of github.com:MagneticResonanceImaging/M…

c845d6b

…RIReco.jl into nh/gpuSupport

Ignore empty plans in FieldMapNFFTOp constructor

6d165d4

nHackel added 2 commits July 31, 2024 19:06

Use Adapt for MRIOperator constructors

70bfd37

Try fixing MRIOperators buildkite

fda6fb7

nHackel added 4 commits August 1, 2024 08:51

Update MRIOperators import in runtests.jl

5be9623

Use MRIOperators test env in buildkite

52c8a81

Activate AMD MRIOperators buildkite step

b79c650

Fix whitespace in pipeline.yml

f0e29bc

tknopp merged commit a37d0ac into master Aug 13, 2024
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU support to MRIReco.jl #182

Add GPU support to MRIReco.jl #182

nHackel commented Apr 17, 2024 •

edited

Loading

nHackel commented Apr 17, 2024

aTrotier commented Apr 18, 2024

nHackel commented Apr 18, 2024 •

edited

Loading

cncastillo commented Apr 18, 2024 •

edited

Loading

nHackel commented Apr 19, 2024

nHackel commented Apr 22, 2024

nHackel commented Jul 31, 2024 •

edited

Loading

cncastillo commented Jul 31, 2024 •

edited

Loading

nHackel commented Jul 31, 2024

nHackel commented Jul 31, 2024

cncastillo commented Jul 31, 2024 •

edited

Loading

nHackel commented Jul 31, 2024

Add GPU support to MRIReco.jl #182

Add GPU support to MRIReco.jl #182

Conversation

nHackel commented Apr 17, 2024 • edited Loading

nHackel commented Apr 17, 2024

aTrotier commented Apr 18, 2024

nHackel commented Apr 18, 2024 • edited Loading

cncastillo commented Apr 18, 2024 • edited Loading

nHackel commented Apr 19, 2024

nHackel commented Apr 22, 2024

nHackel commented Jul 31, 2024 • edited Loading

cncastillo commented Jul 31, 2024 • edited Loading

nHackel commented Jul 31, 2024

nHackel commented Jul 31, 2024

cncastillo commented Jul 31, 2024 • edited Loading

nHackel commented Jul 31, 2024

nHackel commented Apr 17, 2024 •

edited

Loading

nHackel commented Apr 18, 2024 •

edited

Loading

cncastillo commented Apr 18, 2024 •

edited

Loading

nHackel commented Jul 31, 2024 •

edited

Loading

cncastillo commented Jul 31, 2024 •

edited

Loading

cncastillo commented Jul 31, 2024 •

edited

Loading