Add dynamic matrix by Eleobert · Pull Request #21 · genbattle/dkm

Eleobert · 2021-02-06T14:14:48Z

Instead of passing std::vector<std::array<T, N>> I added a new class as_matrix that allow passing data with size defined at runtime. The class is only an interface and all operations are read only.

For example, if the user has the data stored in std::vector<std::pair<float, float>>, we can easily pass it without having to copy to a new container:

auto matrix = dkm::as_matrix(reinterpret_cast<float*>(data.data()), data.size(), 2, false);
auto res = km::kmeans_lloyd(matrix, k);

Or if the data is an armadillo matrix (Eigen, etc) it is even easier:

auto matrix = dkm::as_matrix(data.memptr(), data.n_rows, data.n_cols);
auto res = km::kmeans_lloyd(matrix, k);

I think this approach is much better than the current one. For now, I only added the class and changed dkm.hpp (since I don't know if this will get merged and this is the only relevant part for me). One performance issue we have is as_matrix::row returning by value, but this can be easily fixed.

genbattle · 2021-02-26T01:39:55Z

Hi, thanks for your contribution!

This looks similar to an idea I had for a custom matrix data structure. I'll have a more detailed look at it over this weekend.

genbattle · 2021-02-27T08:19:14Z

include/dkm_matrix.hpp

+    auto res = std::vector<T>(n_cols);
+    for(size_t j = 0; j < n_cols; j++)
+    {
+        res[j] = (*this)(i, j);


One of the reasons this library is so performant is that it avoids allocations and copies wherever possible. This is a full copy of each and every row in the input data, multiple times. It will have a significant effect on performance.

The way to do this performantly and safely would be to return a std::span which points to the internal data. Given this library currently only requires C++11, the solution is probably returning a pointer to the row or a custom struct which emulates the behavior of std::span.

I had been implementing a new version that takes this into account. The idea is whenever the matrix is column major we first copy (because we cannot change the incoming data) and transpose it.

The copy would work like this:

auto [owner, data] = copy(matrix);

From here we can safely return the span. See that the matrix data structure never owns the data, so we need to return an owner (arguably unique_ptr) from the copy function.

I plan to implement this when my project reaches the optimization stage. Here is an implementation I was already working on https://pastebin.com/DYQs2EBb

Why can transposition not just produce a new as_matrix with the correct indexer if we need to switch to column instead of row major.

There's no reason to have any solution here that includes any form of copying.

The transposition is performed only once right after the function is called. I didn't measure the performance yet but I think that the overhead is insignificant.

The idea with transposition is that with column major data the next element of a given row is at a distance of n_rows. It is more performant and easier to work with if we keep the elements at a distance of 1.

My argument was mainly that in this case the representation of the data doesn't need to change at all, we just need need a different view over the data, which seems to be the whole point of as_matrix (an abstracted view over the data).

I understand your point, my only concern is that in the case of row major data to get a row we can easily return std::span(ptr, ncols). But in case of column major it is not possible without adding more complexity. To avoid copying the alternative I can think of would have to be a sort of row_vec whose the indexing would work as following:

auto row_vec::operator()(size_t i) { return data[i * n_rows + this->row_number] }

From my experience the transposing the data + changing from column major to row major is simpler and faster, specially when the input data is large enough. But I am not completely sure if this is true in fact. What is your opinion on this?

genbattle · 2021-02-27T08:19:43Z

include/dkm_matrix.hpp

+// This class is only an interface! Not designed to be used outside library internals.
+
+template<typename T>
+class as_matrix


I think it would be fine to just name this matrix.

The name comes from the fact that we are dealing with the data "as a matrix". But as_matrix is in fact not even a matrix (in common sense), it is only an interface between the algorithm and the data. I think the name as_matrix expresses better the idea.

genbattle · 2021-02-27T08:20:43Z

include/dkm_matrix.hpp

+public:
+    const size_t n_rows, n_cols;
+
+    as_matrix(const T *data, size_t n_rows, size_t n_cols, bool col_major = true)


It would be great to see a constructor for conversion from the existing vector<array<T, N>> type as well to ease migration for existing users.

We can instead overload the main function and mark the old version as [[deprecated]]. What do you think?

This approach is also acceptable, as long as [[deprecated]] will be ignored by older compilers.

genbattle · 2021-02-27T08:23:28Z

include/dkm_matrix.hpp

+    {}
+
+    auto row(size_t i) const -> std::vector<T>;
+    auto operator()(size_t i, size_t j) const -> const T&


I would prefer this was just a named function like get.

Eleobert · 2021-10-08T21:22:11Z

Just to close this, I think mdspan is a better alternative.

g40 · 2023-05-05T16:59:43Z

Hi, did this simply stall? Looks like a very useful addition.

genbattle · 2023-05-08T22:47:33Z

Hi, did this simply stall? Looks like a very useful addition.

Yes, this change stalled, and I haven't had time to implement/update it myself.

Add dynamic matrix

5244139

genbattle self-assigned this Feb 26, 2021

genbattle requested changes Feb 27, 2021

View reviewed changes

genbattle force-pushed the master branch from 8f39f17 to beaca20 Compare April 14, 2025 04:29

genbattle force-pushed the master branch 5 times, most recently from 6704cb8 to 8377868 Compare April 22, 2025 10:49

Conversation

Eleobert commented Feb 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

genbattle commented Feb 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleobert Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleobert Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleobert Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

genbattle Mar 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleobert commented Oct 8, 2021

Uh oh!

g40 commented May 5, 2023

Uh oh!

genbattle commented May 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Eleobert commented Feb 6, 2021 •

edited

Loading

Eleobert Mar 10, 2021 •

edited

Loading

Eleobert Mar 1, 2021 •

edited

Loading

Eleobert Mar 1, 2021 •

edited

Loading

genbattle Mar 3, 2021 •

edited

Loading