Replies: 2 comments
-
Alternatively, we could have multiple maps to look things up in based on the features. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Right now the selection is based on a map of single entries (with key made of M and N as a pair). Maybe we can migrate to multi-map and return all off them. Then on xdlops side, we pick the mfma with that longest K that is compatible with the tile? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
While poking around the gfx940 MFMAs, I noticed a few things that will require extensions to the MFMA database.
The most obvious of these is that we'll need to feed in architecture features, mainly to handle the new int8 ops, which have been widened - 32x32x8 and 16x16x16 have been replaced by 32x32x16 and 16x16x32. The old instructions aren't available anymore. See https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/VOP3PInstructions.td#L626 and look for the let Predicates groups.
On top of that, there's a transition in the BF16 ops. The ones we're currently using are deprecated, and have been removed in gfx940. The new ones (the
_1k
variants) got introduced in gfx90a, and we currently don't use them.I don't currently know if the bf16 ops should be transitioned unconditionally.
The most obvious thing I can think to do is adding
requiredFeatures
andforbiddenFeatures
to the MFMA structures, which'll handle the int8 transition.For bf16 on gfx90a, we can also required/forbidden out the old BF16 instructions or we can try to support both ... which would require adding two different MFMAs that differ only in the K argument...
Specifically, we have a pair of new int8 ops, 32x32x16 and 16x16x32, ( see https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L395 ) which are just 32x32x8 and 16x16x16 but wider.
Now, the thing we definitely need to do is to extend the mfma instruction database to include some notion of what features are required to use an instruction - this is where #1009 comes in.
However, what's less clear is how to handle the need to disambiguate along the K dimension.
(I'm putting this up for @jerryyin , @giuseros , and @manupak since y'all redisigned Mfma selection a while back.)
To my mind, we want the selection process to have both variants available, so that, for example, we can try the narrower instruction when the wider variant is
Beta Was this translation helpful? Give feedback.
All reactions