-
Notifications
You must be signed in to change notification settings - Fork 4
Miscellaneous design decisions
The current API design involves the user #include
ing a lot of headers while also having to link to the beachmat shared library. This was simply the most convenient state compared to the alternatives:
-
A header-only library. In most cases, this would be easier to use (only need the
LinkingTo:
), but here we have to link to the Rhdf5lib library anyway, so we lose the advantage of convenience. We just end up with longer compile times because now everything has to be compiled fresh. -
A traditional library with only the minimum API in the
#include
s. This is painful to do because of the heavy use of templating. It would require explicit template instantiation for all matrix classes, which is very error-prone (especially because template instantiation doesn't work withtypedef
s).
So, the current state is a compromise between the two, with a heavy #include
load because of the templates and a shared library to handle the non-template methods as well as the HDF5 library.
We have used inheritance to define the *_matrix
interface, so that run-time polymorphism is possible for different matrix classes.
However, value extraction is executed by separate *_reader
classes that are contained within the interface, i.e., are data members of the user-visible &_matrix
object.
This allows us to re-use extraction methods for different interfaces.
In particular, the character_matrix
interface differs from the others, so a single set of inheritance templates is not possible.
It also allows us to template the internal (non-virtual) methods so that the overloaded virtual interface methods need only be trivial wrappers.
The two major templated values are T
and V
; T
for the return type of get
, V
for the Rcpp::Vector class used for internal storage and iterators.
Any combination of template arguments is permitted where an element of V
can be successfully converted to type T
.
All integers related to array lengths or indices are size_t
, or will be coerced to such if they are not (e.g., the i
and p
values in a sparse matrix).
This eliminates warnings about signed/unsigned integer comparisons while being explicit about the interpretation of each integer type in any given context.
It may result in a slight decrease in efficiency if int
s need to get promoted to size_t
s, but any difference is probably quite mild.
In theory, greater efficiency could be obtained when calling get_col
on column-major base matrices.
Specifically, a pointer to the column could be directly returned rather than copying the data to a new Vector
object.
However, other matrix classes do not store data in the same manner; if a pointer is to be returned, it would have to be to some internal storage.
This would be dangerous as the values in the internal storage will change upon repeated calls to get_col
.
Filling a user-supplied array is more intuitive as it is obvious that the array will change when get_col
is called again.
If any class has pointers to SEXP
data, the data that each pointer points to should be contained within a RObject
that is also a member of the class.
This ensures that the data is PROTECT
ed for the lifetime of the class instance.
Otherwise, one could imagine a situation where the class is instantiated from a RObject
; the RObject
is destroyed; and garbage collection occurs.
This would invalidate the pointers in the class instance.
It would be nice to allow the get_*
methods to take any random access iterator; however, virtual methods cannot be templated.
We could add a template argument to the entire class, but this would only allow it to take either a random access iterator or a pointer (not both).