Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

representation of GNU C fixed-size vectors #390

Open
sorear opened this issue Jun 20, 2023 · 4 comments
Open

representation of GNU C fixed-size vectors #390

sorear opened this issue Jun 20, 2023 · 4 comments

Comments

@sorear
Copy link
Collaborator

sorear commented Jun 20, 2023

We have an existing issue (#45) for the alignment, which appears to differ between gcc and clang for size > 16 bytes, but I noticed gcc isn't even compatible with itself (compiler explorer link): for vector size = 2*XLEN, the vector is passed in memory if vectorization is enabled, in integer registers otherwise.

typedef int v4si __attribute__ ((vector_size (16)));
v4si sq(v4si in) { return in*in; }

# gcc -O -march=rv64gcv --param riscv-autovec-preference=fixed-vlmax
        vl1re32.v       v1,0(a1)
        ...

# gcc -O -march=rv64gcv
        srai    a4,a0,32
        srai    a5,a1,32
        ...

Do we want to pass fixed-size vectors in vector registers if an appropriate vector calling convention is in use? (This would have been a comment on #389 without the above issue.) This would substantially complicate the compatibility story, since the vector calling convention could no longer be treated as a strict superset of the non-vector calling convention, and we may be able to get most of the benefit using module-internal fastcc-type optimizations.

@kito-cheng
Copy link
Collaborator

We have an existing issue (#45) for the alignment, which appears to differ between gcc and clang for size > 16 bytes,

Ooops, thanks for dig out this issue which created years ago...I think we should spend some time on standardize that.

but I noticed gcc isn't even compatible with itself (compiler explorer link): for vector size = 2*XLEN, the vector is passed in memory if vectorization is enabled, in integer registers otherwise.

Yeah, that's kind of known issue, --param riscv-autovec-preference=fixed-vlmax is an ABI incompatible option, that should be document at least.

Do we want to pass fixed-size vectors in vector registers if an appropriate vector calling convention is in use? (This would have been a comment on #389 without the above issue.) This would substantially complicate the compatibility story, since the vector calling convention could no longer be treated as a strict superset of the non-vector calling convention, and we may be able to get most of the benefit using module-internal fastcc-type optimizations.

Has some off-list discussion with @lhtin, and let me dump some of our discussion here:

Short answer in my heart is: yes, we should consider pass fixed-size vector in vector register.

However there is really complicate compatibility issue between zvl32b, zvl64b and zvl128b...

NOTE: I didn't use zve32* or zve64* here since those zve* could still combine with zvl128b, and issues describe bellow will be gone, so I use zvl32b and zvl64b would be more precise.

Let me try describe this by two different options: 1) better compatibility, 2) better performance/usability.

  1. better compatibility

If we consider the compatibility among zvl32b, zvl64b and zvl128b, then the we must consider the possible smallest vector, so...pass 32 bits fixed size vector in single vector register, pass 64 bits fixed size vector in two vector registers and pass 128 bits fixed size vector in four vector registers.

That's would be bad design because we can expect linux class RISC-V cpu will having v ext. which have zvl128b, and then this design is waste most of vector register space.

But this the way if we don't want to define multiple ABI/calling convention variant for zvl32b, zvl64b and zvl128b.

  1. better performance/usability

v ext require zvl128b which means vector register is at least 128 bits, so the most intuitive design is pass fixed size vector in single vector register (or m1/LMUL=1 in RVV term) if length is less or equal to 128 and pass 129~256 bit in two vector registers and so on until 1024-bits LMUL=8.

However this design can't be apply on zvl32b and zvl64b, will cause compatibility isssue.


So...here is a aggressive idea is we could design a calling convention with argument:

e.g. void __attribute__ ((riscv_vector_cc(vls-vlen=128))) f (int32x4_t) to declare an function with vector ABI and pass 128-bit in vector register like option 2 mentioned above.

And then default vls-vlen=128, so void __attribute__ ((riscv_vector_cc)) f (int32x4_t) will pass int32x4_t in vector register, so for most user, they don't need to specify the vls-vlen= in the attribute.

How about zvl32b and zvl64b? user must specify the vls-vlen in attribute, or having an option -mdefault-vector-abi-vls-len=[32|64].

This design also come with one more advantage is user can pass 256 bit fixed size vector if they want to optimize program.


Or last alternative is we don't do anything on the psABI land, just let compiler use their module-internal fastcc.

@sorear
Copy link
Collaborator Author

sorear commented Jun 28, 2023

I think I agree that this needs to be parameterized and controlled by (ABI perspective) language-specific mechanisms (riscv-c-api-doc perspective) some combination of GNU attributes, explicitly ABI-affecting compiler options, and implementation-dependent fastcc mechanisms.

We have three options to choose from (or for the compiler to choose from for fastcc) on a per-function basis:

  1. Pass in ceil(N/XLEN) integer registers, for N <= 2*XLEN, in memory otherwise. Efficient for naturally XLEN-aligned integer data, or if the P extension is present; otherwise, the argument registers need either unpack steps or a series of vector slides (possibly with a different SEW than the real computation) before use.
  2. Pass in ceil(N/MINVLEN) vector registers, for N <= 8*MINVLEN and MINVLEN a parameter of the function's calling convention, in memory for too large N. If the runtime VLEN is greater than MINVLEN the actual data will be present in the low-numbered vector registers per the normal rules for vector register groups. This is a calling convention parameter only; it is separate from the VLEN>=X or VLEN=X requirements that may be imposed by function code. Efficient if VLEN = MINVLEN or if the hardware implements fast operations for vl <= maxvl/2.
  3. Always pass in memory. Supports all vector lengths and element sizes with roughly equal efficiency.

Functions using option 2 should probably have call-saved registers under the same rules as eventually adopted for vector types.

Should the default behavior be 1 or 3? If we treat the behavior of gcc without --param riscv-autovec-preference=fixed-vlmax as the de facto ABI, it has to be 1.

The attribute name should express the fact that it is specific to fixed-size vectors. I am thinking something like riscv_fixed_vector_cc(xregs), riscv_fixed_vector_cc(memory), riscv_fixed_vector_cc(vregs(MINVLEN)), with VLEN defaulting to 128. riscv_fixed_vector_cc(vregs) is still a bit of a mouthful, can we shorten it without creating an ambiguity with the scalable vector calling convention?

(Besides the ratification of C23, what else needs to happen before we can start talking about [[riscv::fixed_vector_cc(vregs)]]?)

Maybe, there is an argument for defining riscv_vector_cc as primarily enabling call-saved vector registers, and affecting the fixed vector calling convention as a side effect.

Do you have a sense of the amount of new code being written using fixed-size vectors for RISC-V? If the major use case is legacy code using portable fixed-size vectors or a RISC-V implementation of the SSE / NEON intrinsics, then it would make sense to focus more on fastcc support than defining the attributes. The default / externally visible calling convention needs to be defined in any case.

@Amanieu
Copy link

Amanieu commented Oct 15, 2023

This was raised in the context of Rust support for the V extension. The specific concern is in the context of a program compiled without the V extension enabled, but where certain functions are marked #[target_feature(enable = "v")]. This could potentially lead to different functions disagreeing on how to pass fixed-length vectors as arguments.

If the default calling convention allows passing fixed-length vectors in vector registers, then this really should be a separate -mabi variant. After all, defining the calling convention is the entire point of -mabi. Alternatively, a separate opt-in calling convention (such as "vectorcall" on x86) could be used to opt-in to passing fixed-length vectors in vector registers.

This is not a concern for scalable vectors since, unlike fixed-length vectors, no values of this type can be instantiated without the V extension.

@kito-cheng
Copy link
Collaborator

For the base calling convention part:
#406

Vector calling convention will be separated PR and create later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants