Conversation
|
Hi, thanks for your contribution! This looks similar to an idea I had for a custom matrix data structure. I'll have a more detailed look at it over this weekend. |
| auto res = std::vector<T>(n_cols); | ||
| for(size_t j = 0; j < n_cols; j++) | ||
| { | ||
| res[j] = (*this)(i, j); |
There was a problem hiding this comment.
One of the reasons this library is so performant is that it avoids allocations and copies wherever possible. This is a full copy of each and every row in the input data, multiple times. It will have a significant effect on performance.
The way to do this performantly and safely would be to return a std::span which points to the internal data. Given this library currently only requires C++11, the solution is probably returning a pointer to the row or a custom struct which emulates the behavior of std::span.
There was a problem hiding this comment.
I had been implementing a new version that takes this into account. The idea is whenever the matrix is column major we first copy (because we cannot change the incoming data) and transpose it.
The copy would work like this:
auto [owner, data] = copy(matrix);From here we can safely return the span. See that the matrix data structure never owns the data, so we need to return an owner (arguably unique_ptr) from the copy function.
I plan to implement this when my project reaches the optimization stage. Here is an implementation I was already working on https://pastebin.com/DYQs2EBb
There was a problem hiding this comment.
Why can transposition not just produce a new as_matrix with the correct indexer if we need to switch to column instead of row major.
There's no reason to have any solution here that includes any form of copying.
There was a problem hiding this comment.
The transposition is performed only once right after the function is called. I didn't measure the performance yet but I think that the overhead is insignificant.
The idea with transposition is that with column major data the next element of a given row is at a distance of n_rows. It is more performant and easier to work with if we keep the elements at a distance of 1.
There was a problem hiding this comment.
My argument was mainly that in this case the representation of the data doesn't need to change at all, we just need need a different view over the data, which seems to be the whole point of as_matrix (an abstracted view over the data).
There was a problem hiding this comment.
I understand your point, my only concern is that in the case of row major data to get a row we can easily return std::span(ptr, ncols). But in case of column major it is not possible without adding more complexity. To avoid copying the alternative I can think of would have to be a sort of row_vec whose the indexing would work as following:
auto row_vec::operator()(size_t i)
{
return data[i * n_rows + this->row_number]
}From my experience the transposing the data + changing from column major to row major is simpler and faster, specially when the input data is large enough. But I am not completely sure if this is true in fact. What is your opinion on this?
| // This class is only an interface! Not designed to be used outside library internals. | ||
|
|
||
| template<typename T> | ||
| class as_matrix |
There was a problem hiding this comment.
I think it would be fine to just name this matrix.
There was a problem hiding this comment.
The name comes from the fact that we are dealing with the data "as a matrix". But as_matrix is in fact not even a matrix (in common sense), it is only an interface between the algorithm and the data. I think the name as_matrix expresses better the idea.
| public: | ||
| const size_t n_rows, n_cols; | ||
|
|
||
| as_matrix(const T *data, size_t n_rows, size_t n_cols, bool col_major = true) |
There was a problem hiding this comment.
It would be great to see a constructor for conversion from the existing vector<array<T, N>> type as well to ease migration for existing users.
There was a problem hiding this comment.
We can instead overload the main function and mark the old version as [[deprecated]]. What do you think?
There was a problem hiding this comment.
This approach is also acceptable, as long as [[deprecated]] will be ignored by older compilers.
| {} | ||
|
|
||
| auto row(size_t i) const -> std::vector<T>; | ||
| auto operator()(size_t i, size_t j) const -> const T& |
There was a problem hiding this comment.
I would prefer this was just a named function like get.
|
Just to close this, I think mdspan is a better alternative. |
|
Hi, did this simply stall? Looks like a very useful addition. |
Yes, this change stalled, and I haven't had time to implement/update it myself. |
6704cb8 to
8377868
Compare
Instead of passing
std::vector<std::array<T, N>>I added a new classas_matrixthat allow passing data with size defined at runtime. The class is only an interface and all operations are read only.For example, if the user has the data stored in
std::vector<std::pair<float, float>>, we can easily pass it without having to copy to a new container:Or if the data is an armadillo matrix (Eigen, etc) it is even easier:
I think this approach is much better than the current one. For now, I only added the class and changed
dkm.hpp(since I don't know if this will get merged and this is the only relevant part for me). One performance issue we have isas_matrix::rowreturning by value, but this can be easily fixed.