Skip to content

Conversation

calebzulawski
Copy link
Member

Use the attribute added in rust-lang/rust#146667 to produce a post-mono error instead of using the LaneCount<N>: SupportedLaneCount trait.

This makes writing generics around Simd much easier, especially e.g. swizzles that change the lane count.

@calebzulawski calebzulawski force-pushed the post-mono-lane-count-check branch from 46c32f8 to 6ffafb9 Compare September 30, 2025 03:44
Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some tests for 65-lane simd types?

#[allow(unused)]
#[inline(always)]
fn zeroing_idxs<const N: usize>(idxs: Simd<u8, N>) -> Simd<u8, N>
where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for and remove stray where

@sammysheep
Copy link
Contributor

sammysheep commented Sep 30, 2025

How does the implementor know what lanes will be supported (docs)?

Does this attribute depend on the target platform?

@calebzulawski
Copy link
Member Author

Good point, it should be documented.

It is not platform specific, and IMO shouldn't be, because that would allow code to no longer be portable.

@calebzulawski
Copy link
Member Author

maybe add some tests for 65-lane simd types?

I'm not sure how we would do this. I did try 0 and 65 lanes to make sure it still fails.

@sammysheep
Copy link
Contributor

Good point, it should be documented.

It is not platform specific, and IMO shouldn't be, because that would allow code to no longer be portable.

Thanks for the reply, that follows!

I see also from your second comment that the minimum is still 1, which has had good properties for code I've written (0 would lead to nonsense cases for me, others can feel free to differ if they like).

@programmerjake
Copy link
Member

maybe add some tests for 65-lane simd types?

I'm not sure how we would do this. I did try 0 and 65 lanes to make sure it still fails.

I missed the part where you set the limit to 64 and assumed you picked a higher limit (which I think we should do).

65 should work on most operations (wherever there isn't a const { assert!(N <= 64); })

@calebzulawski
Copy link
Member Author

I figured I'd limit this PR to just the API change and leave the limit the same.

I do think that users are mostly responsible for picking reasonable vector lengths for their targets, but I do want to first make sure that codegen is at least reasonable for the longest permitted length.

@workingjubilee
Copy link
Member

The limit should not be raised because raising it past 128 will result in undesired oddities in compiling types with more specialized layout rules unless further work is done to improve the compiler's ability to understand things of various sizes.

@workingjubilee
Copy link
Member

We have been taking shots in the dark at speculative current and future compiler capabilities, whereas simply looking at them and making them real is much easier for the parts which we can manage. Like this PR, which I'm fairly happy to see. Unfortunate that this needs to be done for library work, but this has always been a very special library.

The compiler will require a significant, compiler-spanning refactoring in order to handle a bitmask type (or generic integer, for that matter) larger than 128 bits in a way that makes sense and is consistent with other capabilities of the language and compilers for it. People would balk, for instance, if we told them they can have 129 bits... if they are willing to give up the niche. But it's much harder to reconcile the parts of the compiler that do niche computation with things that are large but also not aggregates.

@programmerjake
Copy link
Member

programmerjake commented Oct 1, 2025

none of this library handles bitmasks with >64 bits, the bitmask impl of Mask was removed for now, and the bitmask conversion methods take and return u64. so compiler support for >64-bit bitmasks is not needed right now

@calebzulawski
Copy link
Member Author

I'm not sure bumping up the limit is a good idea. I just tried a very simple 1024 element function on x86-64 and aarch64 and LLVM legalization took a very long time and produced 1000-2000 instruction functions that should probably be loops. Maybe we can make it an unstable feature in the future that bumps the limit.

https://rust.godbolt.org/z/eMqzvvcvz

Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with leaving the max lane count at 64 for this PR, but I think we should substantially increase it in the future, with a target-specific warning when it's much larger than (>8-16x) the target's native size (taking into account the element type and target features -- after inlining if possible). This allows you to write Simd<u8, 9001> on RVV with big enough vector registers even if that isn't a great idea on most other ISAs.

/// and/or Rust versions, and code should not assume that it is equivalent to
/// `[T; N]`.
///
/// `N` cannot be 0 and may be at most 64.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `N` cannot be 0 and may be at most 64.
/// `N` cannot be 0 and may be at most 64 (though this limit may be increased in the future).

/// Thus it is sound to [`transmute`] `Simd<T, N>` to `[T; N]` and should optimize to "zero cost",
/// but the reverse transmutation may require a copy the compiler cannot simply elide.
///
/// `N` cannot be 0 and may be at most 64.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `N` cannot be 0 and may be at most 64.
/// `N` cannot be 0 and may be at most 64 (though this limit may be increased in the future).

@workingjubilee
Copy link
Member

none of this library handles bitmasks with >64 bits

...because the N is bounded, yes, so we haven't needed to explore higher-valued cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants