-
Notifications
You must be signed in to change notification settings - Fork 91
Remove LaneCount in favor of #[rustc_simd_monomorphize_lane_limit] attribute #485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2d3ec3c
to
46c32f8
Compare
46c32f8
to
6ffafb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some tests for 65-lane simd types?
#[allow(unused)] | ||
#[inline(always)] | ||
fn zeroing_idxs<const N: usize>(idxs: Simd<u8, N>) -> Simd<u8, N> | ||
where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for and remove stray where
How does the implementor know what lanes will be supported (docs)? Does this attribute depend on the target platform? |
Good point, it should be documented. It is not platform specific, and IMO shouldn't be, because that would allow code to no longer be portable. |
I'm not sure how we would do this. I did try 0 and 65 lanes to make sure it still fails. |
Thanks for the reply, that follows! I see also from your second comment that the minimum is still 1, which has had good properties for code I've written (0 would lead to nonsense cases for me, others can feel free to differ if they like). |
I missed the part where you set the limit to 64 and assumed you picked a higher limit (which I think we should do). 65 should work on most operations (wherever there isn't a |
I figured I'd limit this PR to just the API change and leave the limit the same. I do think that users are mostly responsible for picking reasonable vector lengths for their targets, but I do want to first make sure that codegen is at least reasonable for the longest permitted length. |
The limit should not be raised because raising it past 128 will result in undesired oddities in compiling types with more specialized layout rules unless further work is done to improve the compiler's ability to understand things of various sizes. |
We have been taking shots in the dark at speculative current and future compiler capabilities, whereas simply looking at them and making them real is much easier for the parts which we can manage. Like this PR, which I'm fairly happy to see. Unfortunate that this needs to be done for library work, but this has always been a very special library. The compiler will require a significant, compiler-spanning refactoring in order to handle a bitmask type (or generic integer, for that matter) larger than 128 bits in a way that makes sense and is consistent with other capabilities of the language and compilers for it. People would balk, for instance, if we told them they can have 129 bits... if they are willing to give up the niche. But it's much harder to reconcile the parts of the compiler that do niche computation with things that are large but also not aggregates. |
none of this library handles bitmasks with >64 bits, the bitmask impl of |
I'm not sure bumping up the limit is a good idea. I just tried a very simple 1024 element function on x86-64 and aarch64 and LLVM legalization took a very long time and produced 1000-2000 instruction functions that should probably be loops. Maybe we can make it an unstable feature in the future that bumps the limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with leaving the max lane count at 64 for this PR, but I think we should substantially increase it in the future, with a target-specific warning when it's much larger than (>8-16x) the target's native size (taking into account the element type and target features -- after inlining if possible). This allows you to write Simd<u8, 9001>
on RVV with big enough vector registers even if that isn't a great idea on most other ISAs.
/// and/or Rust versions, and code should not assume that it is equivalent to | ||
/// `[T; N]`. | ||
/// | ||
/// `N` cannot be 0 and may be at most 64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `N` cannot be 0 and may be at most 64. | |
/// `N` cannot be 0 and may be at most 64 (though this limit may be increased in the future). |
/// Thus it is sound to [`transmute`] `Simd<T, N>` to `[T; N]` and should optimize to "zero cost", | ||
/// but the reverse transmutation may require a copy the compiler cannot simply elide. | ||
/// | ||
/// `N` cannot be 0 and may be at most 64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `N` cannot be 0 and may be at most 64. | |
/// `N` cannot be 0 and may be at most 64 (though this limit may be increased in the future). |
...because the N is bounded, yes, so we haven't needed to explore higher-valued cases. |
Use the attribute added in rust-lang/rust#146667 to produce a post-mono error instead of using the
LaneCount<N>: SupportedLaneCount
trait.This makes writing generics around
Simd
much easier, especially e.g. swizzles that change the lane count.