-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gather/scatter methods for simd #27
base: master
Are you sure you want to change the base?
Conversation
Summary: gather and scatter methods of simd have not supported those need to be implemented Test Plan: make test Reviewers: chengbin.cb, liangbin.mj, yifeng.dongyifeng, longfei.alf, chuanqi.xcq Issue: https://aone.alibaba-inc.com/req/32929292 CR: https://code.aone.alibaba-inc.com/cpp_libs/std-simd/codereview/4783624
@mattkretz , I try to support the gather and scatter features. I gave my draft to have a discussion before officially commits, Please do not hesitate to correct and comment, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you work here! For the gather & scatter interfaces to be complete we will also need masked gather & scatter overloads. This would make a lot of sense as member functions on where_expression
. Which implies it should be member functions in simd
as well. Hmm. Matches copy_to
and copy_from
so yeah... member functions. But we can't do a ctor without a tag type using a reserved name. Or do the static __gather
function instead of a ctor.
The story just starts here. For gather & scatter to be done nicely we'll want
void f(std::vector<float>& data, simd<int> i) {
data[i] = data[i] * 1.5f;
// or in short
data[i] *= 1.5f;
}
void g(std::vector<std::array<float, 3>>& data, simd<int> i) {
data[i][0] = data[i][0] * 1.5f;
// or in short
data[i][0] *= 1.5f;
}
to work. There's so much more like vector<struct>
and all possible nestings... We can do struct gathers via subscripting with member pointers. (e.g. data[i][&S::a]
) And then, of course, once C++ gets real reflection...
But, let's do the low-level interface as a first important step.
Next step: Let's write a paper to the C++ committee to make the low-level gather & scatter interface part of the standard.
// gather constructor | ||
template <typename _Up, typename _Flags> | ||
_GLIBCXX_SIMD_ALWAYS_INLINE | ||
simd(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx, _Flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is a non-standard (as not specified in an ISO standard) interface. I believe we therefore should implement gather & scatter interfaces only as free functions (possibly hidden friends) or static member functions. And the function name must be a reserved name.
- I think we want the index parameter to always be of type
simd
. Turning an array into simd isn't too hard. And the expectation for gather & scatter is that indexes are computed viasimd
objects. - Do you know whether alignment of the memory pointer matters for any gather / scatter hardware? Because if not then the
_Flags
parameter is unnecessary. I'd drop it. - A
constexpr
scale parameter has been useful in Vc 1. I'm not sure whether we want it as public API at some point, though.
template <size_t _Scale = 1, typename _From, typename _Idx, typename _IdxAbi>
_GLIBCXX_SIMD_ALWAYS_INLINE static _GLIBCXX_SIMD_CONSTEXPR
simd
__gather(const _From* __mem, const simd<_Idx, _IdxAbi>& __idx, _SizeConstant<_Scale> __scale = {})
{
return simd([&](auto __i) { return static_cast<value_type>(__mem[__idx[__i] * __scale]); });
}
I also just implemented a complete _S_gather
implementation for _SimdImplX86
(AVX512 and AVX2) and fallback in _SimdImplBuiltin
. I didn't implement anything else, because I only use it internally to implement a lookup table for my std::exp(simd)
implementation. It's not pushed anywhere yet. But I'm closing in on getting the patch ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- simd construct with memory loading has supported, dose the construct with gather need to be the extension?
- In my mind, _Flags for __mem of gather is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have implemented _S_gather, will I work based on your implementation be better?
template <size_t _Offset> | ||
_GLIBCXX_SIMD_INTRINSIC constexpr auto _M_tuple_at() const | ||
{ | ||
if constexpr (_Offset == 0) | ||
return first; | ||
else | ||
return second.template _M_tuple_at<_Offset - simd_size_v<_Tp, _Abi0>>(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the name I expected it to do the same as _M_at
. Maybe call it _M_tuple_with_element<N>
. However, I think you can't use this function anyway; see below.
Also, what if _Offset > 0
but _Offset < simd_size_v<_Tp, _Abi0>
?
template <typename _Tp, typename _Up> | ||
static inline _SimdMember<_Tp> | ||
_S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx, | ||
_TypeTag<_Tp>) noexcept | ||
{ | ||
return _SimdMember<_Tp>::_S_generate([&](auto __meta) { | ||
return __meta._S_gather( | ||
__mem, __idx.template _M_tuple_at<__meta._S_offset>(), | ||
_TypeTag<_Tp>()); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend the types like this? I.e. the value_type
of the index vector equals the value_type
of the returned / gathered vector? I.e. you can't gather _SimdMember<float>
unless __idx
is of type _SimdMember<float>
. I'd expect int
for the latter.
Once you allow different types for gathered vector and index vector you will get size mismatches: e.g. gather floats using 64-bit int index vector. Or double
/ int
, or float
/ int
on AVX w/o AVX2. Then the _M_tuple_at
access will fail. I hate this case, but I already handled it for some of the math functions. You should be able to find inspiration there. 😉
_S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem, | ||
const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same type question as above. Did you want different types for __v
and __idx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I try to support different types in the following manner, but it seems that is not best way after reading our comments.
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<__int_for_sizeof_t<_Tp>>& __idx,
_TypeTag<_Tp>) noexcept
As I implemented in RFC, how to represent the index is one of the important and confusing things to me.
On the other hand, the index works as Tp* __idx can avoid splitting into simd instances.
As you pointed out, the first thing goes first, the gather & scatter interface is the best start point. |
The internal API doesn't have to be as precise as the public API. For template <typename _Tp, typename _IdxV>
_GLIBCXX_SIMD_INTRINSIC static
_SimdWrapper<_Tp, _S_size<_Tp>>
_S_gather(const _Tp* __mem, const _IdxV& __idxv) i.e. the function signature allows any type for the index vector. That's obviously not true. The first thing the implementation does is What we really need to care about is the public API. The index will probably have to be a constrained deduced type like |
Summary: gather and scatter methods of simd have not supported
those need to be implemented
Test Plan: make test