-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic iteration function for n dimension mdspan #202
Comments
Hi @SimeonEhrig ! Bryce also has presentations that talk about the two C++23 ranges features You could also imagine specializing for the case of iterating over an mdspan's domain (which is a Cartesian product of the extents). It's an exercise (not an easy one!) in parameter pack algebra. |
I developed a solution: #include <experimental/mdspan>
#include <iostream>
namespace stdex = std::experimental;
/**
* @brief Construct submdspan of mdspan. The submdspan has one rank less than the mdspan. The left dimension is fixed
* to a specific index. The rest of the dimension contains the full range.
*
* @tparam TRank Dimension of the new submdspan (needs to be mdspan::rank()-1).
*/
template<int TRank>
struct Construct_Submdspan;
template<>
struct Construct_Submdspan<1>
{
template<typename TSpan, typename... Types>
constexpr auto construct(TSpan span, std::size_t const fixed_index_pos, Types... args)
{
return stdex::submdspan(span, fixed_index_pos, args...);
}
};
template<int TRank>
struct Construct_Submdspan
{
/**
* @brief Returns the submdspan of a mdspan, with one dimension less.
*
* @tparam TSpan Type of the span
* @tparam Types needs to std::experimental::full_extent_t
* @param span mdspan from which the submdspan is created
* @param fixed_index_pos Index postion of the fixed dimension
* @param args needs to std::experimental::full_extent
* @return constexpr auto returns a stdex::submdspan
*/
template<typename TSpan, typename... Types>
constexpr auto construct(TSpan span, std::size_t const fixed_index_pos, Types... args)
{
return Construct_Submdspan<TRank - 1>{}.construct(span, fixed_index_pos, stdex::full_extent, args...);
}
};
/**
* @brief Returns a submdspan of mdspan. The submdspan has one rank less than the mdspan. The left dimension is fixed
* to a specific index. The rest of the dimension contains the full range.
*
* @tparam TMDSpan
* @param span mdspan from which the submdspan is created
* @param fixed_index_pos Index postion of the fixed dimension
* @return constexpr auto returns a stdex::submdspan
*/
template<typename TMDSpan>
constexpr auto submdspan_remove_dim(TMDSpan span, std::size_t const fixed_index_pos)
{
constexpr auto rank = TMDSpan::rank();
return Construct_Submdspan<rank - 1>{}.construct(span, fixed_index_pos, stdex::full_extent);
}
/**
* @brief Iterates over all elements of an n dimension mdspan. The iteration order is from the right to the left
* dimension.
*
* @tparam TDim Rank of the mdspan
*/
template<int TDim>
struct Iterate_mdspan;
template<>
struct Iterate_mdspan<1>
{
template<typename TSpan, typename TFunc>
void operator()(TSpan span, TFunc& functor)
{
for(auto i = 0; i < span.extent(0); ++i)
{
span(i) = functor(span(i));
}
}
};
template<int TDim>
struct Iterate_mdspan
{
/**
* @brief Iterate over all elements of an mdspan and apply the functor on it.
*
* @tparam TSpan type of the mdspan
* @tparam TFunc type of the functor
* @param span The mdspan
* @param functor The functor
*/
template<typename TSpan, typename TFunc>
void operator()(TSpan span, TFunc& functor)
{
for(auto i = 0; i < span.extent(0); ++i)
{
auto submdspan = submdspan_remove_dim(span, i);
Iterate_mdspan<TSpan::rank() - 1>{}(submdspan, functor);
}
}
};
/**
* @brief Do the same like std::iota with a n-dimensional mdspan. The iteration order is from the right to the left
* dimension.
*
* @tparam TSpan type of the mdspan
* @tparam TData type of the functor
* @param span The mdspan
* @param index value of the first element
*/
template<typename TSpan, typename TData>
void iota_span(TSpan span, TData index)
{
static_assert(TSpan::rank() > 0);
auto functor = [&index](TData input) { return index++; };
Iterate_mdspan<TSpan::rank()>{}(span, functor);
}
int main()
{
std::array<int, 12> d;
// stdex::mdspan m{d.data(), stdex::extents{12}};
// stdex::mdspan m{d.data(), stdex::extents{2, 6}};
// stdex::mdspan m{d.data(), stdex::extents{2, 4, 2}};
stdex::mdspan m{d.data(), stdex::extents{2, 2, 1, 4}};
iota_span(m, 42);
for(auto const& v : d)
{
std::cout << v << " ";
}
std::cout << std::endl;
return 0;
} I take some times, until I understand all required rules of variadic templates and variadic parameter packs and it takes also some time to combine everything but now I have a working solution. cppinsights helped a lot to developed it. Now, I need to check, if the compiler can optimize it. |
Excellent work! C++23 ( #include <array>
#include <iostream>
#include <ranges>
#include <range/v3/view/cartesian_product.hpp>
#include <range/v3/view/indices.hpp>
#include <tuple>
#include <https://raw.githubusercontent.com/kokkos/mdspan/single-header/mdspan.hpp>
namespace stdex = std::experimental;
// You can't use std::apply on templated nonmember functions,
// but you can use it on generic lambdas. See the example here:
//
// https://en.cppreference.com/w/cpp/utility/apply
auto output_one_tuple_2 = []<class ... InputTypes>(InputTypes&& ... input) {
// lambdas with explicit template parameters require C++20.
auto print_all = [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto print_one = [&] (std::size_t index, auto&& in) {
std::cout << in;
if(index + 1 < sizeof...(Indices)) {
std::cout << ", ";
}
};
(print_one(Indices, input), ...);
};
std::cout << '(';
print_all(std::make_index_sequence<sizeof...(InputTypes)>());
std::cout << ")\n";
};
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents(Callable&& f, stdex::extents<IndexType, Extents...> e) {
[&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto v = ranges::views::cartesian_product(ranges::views::indices(0, e.extent(Indices))...);
for(const auto& k : v) {
std::apply(std::forward<Callable>(f), k);
}
}(std::make_index_sequence<sizeof...(Extents)>());
}
int main() {
stdex::extents<int, 3, 4, 5> e;
for_each_in_extents(output_one_tuple_2, e);
return 0;
} |
Here's an improvement of that example, that shows how to iterate over the domain of an mdspan without needing ranges: https://godbolt.org/z/nabce8WW6 #include <https://raw.githubusercontent.com/kokkos/mdspan/single-header/mdspan.hpp>
#include <array>
#include <iostream>
#include <tuple>
namespace stdex = std::experimental;
// You can't use std::apply on templated nonmember functions,
// but you can use it on generic lambdas. See the example here:
//
// https://en.cppreference.com/w/cpp/utility/apply
auto print_pack = []<class ... InputTypes>(InputTypes&& ... input) {
auto print_all = [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
auto print_one = [&] (std::size_t index, auto&& in) {
std::cout << in;
if(index + 1 < sizeof...(Indices)) {
std::cout << ", ";
}
};
(print_one(Indices, input), ...);
};
std::cout << '(';
print_all(std::make_index_sequence<sizeof...(InputTypes)>());
std::cout << ")\n";
};
// C++20 lets you write lambdas templated on <std::size_t ... Indices>.
// Calling those lambdas with the result of std::make_index_sequence
// is a good way to "iterate" over a parameter pack.
// If you don't have C++20, you can replace the lambda
// with a separate, named helper function.
// Returns a new extents object representing all but the leftmost extent of e.
template<class IndexType, std::size_t ... Extents>
auto right_extents( stdex::extents<IndexType, Extents...> e )
{
static_assert(sizeof...(Extents) != 0);
return [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
return stdex::extents<IndexType, e.static_extent(Indices + 1)...>{
e.extent(Indices + 1)...
};
}( std::make_index_sequence<sizeof...(Extents) - 1>() );
}
// right_extents can be implemented by overloading for
// extents<IndexType, LeftExtent, RightExtents...>.
// That approach doesn't work for left_extents.
// Returns a new extents object representing all but the rightmost extent of e.
template<class IndexType, std::size_t ... Extents>
auto left_extents( stdex::extents<IndexType, Extents...> e )
{
static_assert(sizeof...(Extents) != 0);
return [&]<std::size_t ... Indices>( std::index_sequence<Indices...> ) {
return stdex::extents<IndexType, e.static_extent(Indices)...>{
e.extent(Indices)...
};
}( std::make_index_sequence<sizeof...(Extents) - 1>() );
}
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents_row_major(Callable&& callable, stdex::extents<IndexType, Extents...> ext)
{
if constexpr(ext.rank() == 0) {
return;
} else if constexpr(ext.rank() == 1) {
if constexpr(ext.static_extent(0) == stdex::dynamic_extent) {
for(IndexType index = 0; index < ext.extent(0); ++index) {
std::forward<Callable>(callable)(index);
}
} else {
// TODO unroll this loop at compile time
// (the compiler might do that for us anyway).
for(IndexType index = 0; index < ext.static_extent(0); ++index) {
std::forward<Callable>(callable)(index);
}
}
} else {
auto right_ext = right_extents(ext);
if constexpr(ext.static_extent(0) == stdex::dynamic_extent) {
for(IndexType left_index = 0; left_index < ext.extent(0); ++left_index) {
auto next = [&] (auto... right_indices) {
std::forward<Callable>(callable)(left_index, right_indices...);
};
for_each_in_extents_row_major( next, right_ext );
}
} else {
// TODO unroll this loop at compile time
// (the compiler might do that for us anyway).
for(IndexType left_index = 0; left_index < ext.static_extent(0); ++left_index) {
auto next = [&] (auto... right_indices) {
std::forward<Callable>(callable)(left_index, right_indices...);
};
for_each_in_extents_row_major( next, right_ext );
}
}
}
}
// Overloading on stdex::extents<IndexType, LeftExtents..., RightExtent>
// works fine for the row major case, but not for the column major case.
template<class Callable, class IndexType, std::size_t ... Extents>
void for_each_in_extents_col_major(Callable&& callable, stdex::extents<IndexType, Extents...> ext)
{
if constexpr(ext.rank() == 0) {
return;
} else if constexpr (ext.rank() == 1) {
if constexpr(ext.static_extent(ext.rank() - 1) == stdex::dynamic_extent) {
for(IndexType index = 0; index < ext.extent(ext.rank() - 1); ++index) {
std::forward<Callable>(callable)(index);
}
} else {
for(IndexType index = 0; index < ext.static_extent(ext.rank() - 1); ++index) {
std::forward<Callable>(callable)(index);
}
}
} else {
auto left_ext = left_extents(ext);
if constexpr(ext.static_extent(ext.rank() - 1) == stdex::dynamic_extent) {
for(IndexType right_index = 0; right_index < ext.extent(ext.rank() - 1); ++right_index) {
auto next = [&] (auto... left_indices) {
std::forward<Callable>(callable)(left_indices..., right_index);
};
for_each_in_extents_col_major( next, left_ext );
}
} else {
for(IndexType right_index = 0; right_index < ext.static_extent(ext.rank() - 1); ++right_index) {
auto next = [&] (auto... left_indices) {
std::forward<Callable>(callable)(left_indices..., right_index);
};
for_each_in_extents_col_major( next, left_ext );
}
}
}
}
int main() {
stdex::extents<int, 3, 4, 5> e;
std::cout << "Row major:\n";
for_each_in_extents_row_major(print_pack, e);
std::cout << "\nColumn major:\n";
for_each_in_extents_col_major(print_pack, e);
return 0;
} |
Since I had to implement the same functionality in LLAMA, here is my version: // ... includes etc.
template<typename IndexType, std::size_t ... Extents, typename Func,
std::size_t FirstExtInd, std::size_t... RestExtInds, typename... Is>
void forEachIndex(const stdex::extents<IndexType, Extents...>& ext, Func&& func,
std::index_sequence<FirstExtInd, RestExtInds...>, Is... is) {
for(IndexType i = 0; i < ext.extent(FirstExtInd); i++)
if constexpr(sizeof...(RestExtInds) > 0)
forEachIndex(ext, std::forward<Func>(func), std::index_sequence<RestExtInds...>{}, is..., i);
else
std::forward<Func>(func)(is..., i);
}
template<class IndexType, std::size_t ... Extents, typename Func>
void forEachIndex(stdex::extents<IndexType, Extents...> ext, Func&& func) {
// for column-major, you could reverse the index_sequence
forEachIndex(ext, std::forward<Func>(func), std::make_index_sequence<sizeof...(Extents)>{});
}
int main() {
stdex::extents<int, 3, stdex::dynamic_extent, 5> e{4};
forEachIndex(e, print_pack);
return 0;
} Godbolt: https://godbolt.org/z/6ae585YM8 The trick here is to use an |
@SimeonEhrig here is your example of template<typename TSpan, typename TData>
void iota_span(TSpan span, TData index)
{
forEachIndex(span.extents(), [&](auto... is){
span(is...) = index++;
});
} Godbolt: https://godbolt.org/z/5hncdK1fj |
@mhoemmen and @bernhardmgruber thanks for the solutions. Looks much better, than my solution. I think, I will take @bernhardmgruber solution and extend it with the ideas of @mhoemmen solution (I cannot directly overtake it, because we cannot use C++20 because of the nvcc compiler). |
@SimeonEhrig I think I can do better to make my code more concise! I've been working on it when I need a break from other things. The only strictly C++20 things in my example are lambdas with named template parameters, but you've figured out how to work around. You might also want to consider Kokkos' MDRange, an existing optimized parallel multidimensional index for-each that only requires C++14 and works well with nvcc and other compilers. |
@bernhardmgruber @SimeonEhrig Here's a refinement of my above example: https://godbolt.org/z/o9ecK7coj -- please see in particular |
Thanks for the solution. Looks like my idea in a heavy optimized version 😅 I'm also impressed, how you can use templated lambdas in C++20 for template meta programming. I read about templated lambdas but I never thought about to use it for meta programming. |
TMP is the use for lambdas with explicit template heads until we get P1061, possibly in C++26. Edit: given that stdex::extents<int, 3, stdex::dynamic_extent, 5> e{4};
const auto [a, b, c] = e; // compile error
const auto [b] = e; // ok, but b is now some kind of internal `storage_t`. @mhoemmen should this be fixed? That is, should |
This structured binding thing will not work no matter what, certainly wouldn't produce runtime values. This just makes internal members visible. |
Oh I see you could do it for custom types. I guess someone could propose a paper to do this ... |
@crtrott Right, one could overload the various @bernhardmgruber It's a neat idea to make this work for |
@crtrott and @mhoemmen Exactly, we would need specializations of I am interested in writing a small paper. It's a little personal wish of mine for a long time anyway :) I will try to float a draft on the mailing list in a couple of weeks. |
@bernhardmgruber That's excellent! We would be happy to review early drafts of the paper before you send it to the mailing list. Thanks! |
Hello, thanks for the nice template <class T, class IndexType, size_t E0, size_t... E, class L, class A, class Fun>
void forEach(std::mdspan<T, std::extents<IndexType, E0, E...>, L, A> x, Fun&& f)
{
auto full = [](size_t) { return std::full_extent; };
for (IndexType i = 0; i < x.extent(0); i++)
if constexpr (x.rank() > 1)
forEach(stdex::submdspan(x, i, full(E)...), f);
else
std::forward<Fun>(f)(x[i]);
} |
Hi @yurielnf , my personal experience is, that constructing recursive |
Thanks @SimeonEhrig. This implementation works for strided template <size_t pos, class T, class E, class L, class A, class Fun>
void forEach_impl(std::mdspan<T, E, L, A> const& x, Fun&& f, size_t offset) {
if constexpr (pos == x.rank())
std::forward<Fun>(f)(x.accessor().access(x.data_handle(), offset));
else
for (size_t i = 0u; i < x.extent(pos); i++)
forEach_impl<pos + 1>(x, std::forward<Fun>(f), offset + i * x.stride(pos));
}
template <class T, class E, class L, class A, class Fun>
void forEach(std::mdspan<T, E, L, A> const& x, Fun&& f) {
static_assert(x.is_always_strided(), "valid only for strided mdspan");
forEach_impl<0>(x, std::forward<Fun>(f), 0);
} |
Take this solution from this issue, pass a |
Excellent! It is a beautiful code! |
@yurielnf wrote:
Thanks so much! : - D
My implementation is generic; it works for any layout (if reading, or for any unique layout if writing). If you want iteration to be efficient, the best way is to specialize for the layout. For exhaustive layouts, just skip the layout mapping and iterate directly over [0, For strided layouts where either the leftmost or rightmost extent has stride 1 (see P2642), you might get faster code by treating the stride-1 extent as a special case. Another generic approach is to use the inverse of a canonical exhaustive layout. For example, take the Thanks for your interest in mdspan! : - ) |
Hi, I'm working on vikunja, a platform-independent primitives library (e.g.
vikunja::transform
,vikunja::reduce
) for different accelerators based on alpaka.I want to use
mdspan
to support N dimension memory. In the presentation of Bryce Aldelstein Lelbach (youtube link) there is the idea to build a recursive function to create n nested loops, which the compiler can optimize.I implemented a prototype with
submdspan
for an iota function for amdspan
:My problem is, that I need to specialize each dimension by hand. At the moment I stopped at dim 3. Does anybody have an idea to write a generic function to iterate over all elements of a
mdspan
?Maybe it is possible to set the n
stdex::full_extent
arguments in the functionstdex::submdspan(span, i, stdex::full_extent, ...)
depending on the template parameterTDim
. I'm not sure, if this is possible with some variadic or type trait.The text was updated successfully, but these errors were encountered: