-
Notifications
You must be signed in to change notification settings - Fork 110
OpenACC + Cray CCE + AMD MI200+ #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cray's gray skies getting somewhere... the leaves are brown and the sky is cray cray is driving us cray cray not waitin' t'ill we're old and cray fixing some problems cray-ted by cray cray-ving simpler times cray-shing and burning we're too cray-tive cray-ing in pain cray-ving PTO hmm
There's multiple facets to these changes to fix CCE handling allocatable module arrays: 1) CCE only has a problem with allocatable module arrays, not scalars or statically sized arrays. Use declare create for everything that isn't allocatable. 2) Allocatable array handling is broken in CCE <= 16.0.0, but an effective (but ugly) workaround is to add a pointer in front of the allocatable and leverage the pointer attachment logic to fixup the link. The CRAY_DECLARE_GLOBAL macro declares the shadow allocatable and the pointer. The ALLOCATE_GLOBAL macro allocates the shadow, attaches the pointer, and creates/attaches each on the device. The DEALLOCATE_GLOBAL macro detaches and releases the device entries, nullifies the pointer, and deallocates the shadow. The ALLOCATE and DEALLOCATE macros can be used for local variables or derived type components. This commit still isn't functional on AMD GPUs due to some register allocation issues in a specific function. Those will be addressed in a future commit.
Loop bound variables don't need to be mapped.
G2 will do a debug build without disabling loop collapsing.
Too much state+too many forks can hit a compiler bug around SGPR spillage.
This may not be needed, and will probably be removed.
It's possible that CCE is not mangling the private global names on the device in a way that makes them unique, which could be causing problems. Most of these don't need to declared to the device, though. They're loop bounds which shouldn't need to be mapped. Need to check if these WARs are actually neeeded.
There's a possible CCE issue with subarrays on the device. Not clear if it's a real problem or just a dev compiler build problem.
By default CCE fortran will try to make kernels async, and do some addressing tweaks. These can sometimes cause problems, so turning them off is almost always a good debugging step.
This is related to some screwy variable name mangling, and might not be necessary.
This was causing a lot of build failures.
This seems to be an endemic change, we may need to roll more back
Change Update: Did this myself in 110a290 |
Specifically: FileNotFoundError: [Errno 2] No such file or directory:
'/lustre/orion/cfd154/scratch/sbryngelson/MFC/build/install/dependencies/bin/h5d
ump' and sbryngelson/scratch $ ls MFC/build/install/dependencies/bin/
hipfc |
It's looking like Frontier CI may fail for the 2-rank case. Tests were run with
The test MFC.sh file in the 2-rank directory reads
which appears to be the problem, it should be passing Update: It passed on second try 🤷 |
@henryleberre, do you know why it doesn't build h5dump? (or at least it isn't found in the expected |
@sbryngelson We opted not to build HDF5 on CCE. I forget why, perhaps there were some incompatibilities. We use the cray-hdf5 module so h5dump should already be available. |
@henryleberre you are correct, Here: It does look like we have this option: if ARG("no_hdf5"):
if not does_command_exist("h5dump"):
raise MFCException("--no-hdf5 was specified and h5dump couldn't be found.")
h5dump = shutil.which("h5dump") though it doesn't seem to be working like this |
@sbryngelson I'm testing a fix. For your command, you would have to use this instead: $ ./mfc.sh test -a --no-hdf5 -- -c frontier |
@henryleberre this works! |
Description
Adds support for MI200+ GPUs via CCE compilers and OpenACC.
Type of change
Please delete options that are not relevant.
Scope
Closes #352 #383 #384
Test Configuration: