Skip to content

DylanMuir/MappedTensor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Transparent lazy data access for matlab

By default, Matlab matrices must be fully loaded into memory. This can make allocating and working with huge matrices a pain, especially if you only really need access to a small portion of the matrix at a time. memmapfile allows the data for a matrix to be stored on disk, but you can't access the matrix transparently in functions that don't expect a memmapfile object without reading in the whole matrix. MappedTensor is a matlab class that looks like a simple matlab tensor, with all the data stored on disk.

A few extra niceties over memmapfile are included, such as built-in per-slice access; fast addition, subtraction, multiplication and division by scalars; fast negation; permutation; complex support.

Tensor data is automatically allocated on disk in a temporary file, which is removed when all referencing objects are cleared. Existing binary files can also be accessed. MappedTensor is a handle class, which means that assigning an existing mapped tensor to another variable will not make a copy, but both variables will point to the same data. Changing the data in one variable will change both variables.

MappedTensor internally uses mex functions, which need to be compiled the first time MappedTensor is used. If compilation fails then slower, non-mex versions will be used.

Download and install

Download MappedTensor
Unzip the @MappedTensor directory to somewhere on the Matlab path. The @ ampersand symbol is important, as it signals to Matlab that this is a class directory. Then type:

addpath /path/to/Mappedtensor

Note: MappedTensor provides an accelerated MEX function for performing file reads and writes. MappedTensor will attempt to compile this function when a MappedTensor variable is first created. This requires mex to be configured correctly for your system. If compilation fails, then a slower pure Matlab version will be used.

Creating a MappedTensor object

M = MappedTensor([dim1 dim2...], ...) M = MappedTensor(dim1, dim2, ...) M = MappedTensor(dim, ...) allocates a MappedTensor object to store an array with specified size [d1 d2...]. When only one dimension value is specified, the actual dimension is for a square array [d d].

M = MappedTensor(FILENAME, [d1 d2...], 'FORMAT', class, ...) constructs a MappedTensor object that re-uses an existing map file FILENAME for an array with dimensions [d1 d2...]. The full size, class, and offset of the file must
be known and specified in advance. This file will not be removed when all handle references are destroyed.

M = MappedTensor(ARRAY) constructs a MappedTensor object that maps a numeric array ARRAY into a temporary file. The array must be 2D or more (not scalar, nor vector that would conflict with a dimension setting). This syntax implies to already allocate the initial array, which limits the size of the MappedTensor. For large arrays, it is more efficient to pre-allocate the object with specified dimensions or the 'Size' property and then set the content, per chunks.

M = MappedTensor(..., PROP1, VALUE1, PROP2, VALUE2, ...) constructs a MappedTensor object, and sets the properties of that object that are named in the argument list (PROP1, PROP2, etc.) to the given values (VALUE1, VALUE2, etc.). All property name arguments must be quoted strings (e.g., 'writable'). Any properties that are not specified are given their default values.

Note: When a variable containing a MappedTensor object goes out of scope or is otherwise cleared, the memory map is automatically closed. You may also call the DELETE method to force clear the object.

Tensor properties

All properties can be accessed with syntax e.g. M.property. All these properties can also be set when building the tensor.

Property Description
Data The actual Data
Filename Binary data file name on disk (real part of tensor)
FilenameCmplx Binary data file name on disk (complex part of tensor)
Format The class of this mapped tensor
MachineFormat The desired machine format of the mapped file
Offset The number of bytes to skip at the beginning of the file
Temporary A flag which records whether a temporary file was created
Writable Should the data be protected from writing?

We detail below the use of these properties, especially to set an initial tensor.

Format: Char string (defaults to 'double').

Format of the contents of the mapped region. Format specifies that the mapped data is to be accessed as a single vector of type specified by Format's value. Supported char arrays are 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double'. Complex arrays are supported. Sparse arrays are not supported. You can change later the storage class of the object with the CAST method, however this is usually not recommended.

Offset: Non-negative integer (defaults to 0).

Number of bytes from the start of the file to the start of the mapped region. Offset 0 represents the start of the file. This allows to skip over the beginning of an (existing) binary file, by throwing away the specified number of header bytes. You can use methdos FREAD and FWRITE to read this header region.

Writable: True or false (defaults to false).

Access level which determines whether or not Data property (see below) may be assigned to. This property can be changed after object creation.

Temporary: True or false (default to true when created from array).

When false, the associated file is kept when the object is cleared. Such files can be further reused. When the object is created from an array, Temporary is true. When creating from an existing map file Temporary is false. You can change this property after creation. When saving an object, the Temporary state is set to false. This property can be changed after object creation.

MachineFormat: big-endian ('ieee-be') or little-endian ('ieee-le')

If not specified, the machine-native format will be used.

Data: array

Array to assign to the mapped object. This property can be changed after object creation. You can also set the Data with syntax:

set(M, 'Data', array)
M(:)            = whole_array;
M([ 1 3 5... ]) = slice; 
Filename: Char array.

Contains the name of the file being mapped. You can also get the mapped file with FILEPARTS.

FilenameCmplx: Char array.

Contains the name of the file being mapped (complex part). You can also get the mapped file with FILEPARTS.

Additional Name/Value pair options at build only

TempDir: Directory path

Directory where the mapped file(s) should stored. The default path is e.g. TMPDIR or /tmp. You may also use /dev/shm on Linux systems to map the file into memory. This can be very efficient in terms of I/O, and an be coupled with tensor compression with the pack method.

Like: array

Specified array dimension and class is used to preallocate a new object. Note that sparse arrays are not supported.

Size: [d1 d2 ...] array

Vector which specifies the size of the mapped array. This is the same as specifying dimensions as first arguments (see above).

All the properties above may also be accessed after the MappedTensor object has been created with the GET method. For example,

set(M, 'Writable', true); % or M.Writable = true;

changes the Writable property of M to true.

Loading from data files

The LOAD method allows to lazy import binary data sets with syntax

m = load(MappedTensor, 'filename');

with the following data formats.

Extension Description
EDF ESRF Data Format (2D)
POS Atom Probe Tomography (4 columns)
NPY Python NumPy array (nD)
MRC MAP CCP4 RES MRC MRC/CCP4/MAP electronic density map (3D)
MAR MAR CCD image (2D)
IMG MCCD ADSC X-ray detector image SMV (2D)

Using the array

The MappedTensor array can be used in most cases just as a normal Matlab array, as many class methods have been defined to match the usual behaviour.

You may access the array with indices as in M(I,J,..). The full tensor content is retrieved with M(:) as a column, or M(:,:) as pages, and finally as M.Data to get the raw shaped array.

Most standard Matlab operators just work transparently with MAPPEDTENSOR. You may use single objects, and even array of tensors for a vectorized processing, such as in:

m=MappedTensor(rand(100)); n=copyobj(m); p=2*[m n];

These objects contain a reference to the actual data. Defining n=m actually access the same data. To make a copy, use the COPYOBJ method.

Transparent casting to other classes is supported in O(1) time. Note that due to transparent casting and tranparent O(1) scaling, rounding may occur in a different class to the returned data, and therefore may not match Matlab rounding precisely. If this is an issue, index the tensor and then scale the returned values rather than rely on O(1) scaling of the entire tensor.

To work efficiently on very large arrays, it is recommended to employ the ARRAYFUN method, which applies a function FUN along a given dimension. This is done transparently for many unary and binary operators (with ARRAYFUN2).

The NUMEL method returns 1 on a single object, and the number of elements in vectors of objects. To get the number of elements in a single object, use NUMEL2(M) or PROD(SIZE(M)). This behaviour allows most methods to be vectorized on sequences on tensors.

If you need to handle many such tensors, it may be a good idea to compress them with pack(m) while you are not using them. This can be done for instance right after loading content. Decompression is performed transparently while you access the tensor array. Think about re-compressing afterwards to save disk/memory. Compression is usually extremely efficient on data with low randomness.

An efficient processing pipeline could be:

  • load tensors
  • compress them with pack
  • do whatever you need (extraction is performed automatically)
  • recompress as soon as possible with pack

A list of available methods is shown below.

Method Description
abs Absolute value. (unary op)
acos Inverse cosine, result in radians. (unary op)
acosh Inverse hyperbolic cosine. (unary op)
addlistener Add listener for event.
all True if all elements of a tensor are nonzero. (unary op)
and & Logical AND. (binary op)
any True if any element of a tensor is a nonzero number or is (unary op)
arrayfun Apply a function on the entire array, in slices.
arrayfun2 Apply a function on two similar arrays, in slices.
asin Inverse sine, result in radians. (unary op)
asinh Inverse hyperbolic sine. (unary op)
atan Inverse tangent, result in radians. (unary op)
atanh Inverse hyperbolic tangent. (unary op)
cast Cast a variable to a different data type or class.
ceil Round towards plus infinity. (unary op)
char Convert tensor representation to character array (string).
conj Complex conjugate. (unary op)
copyobj Make deep copy of array.
cos Cosine of argument in radians. (unary op)
cosh Hyperbolic cosine. (unary op)
ctranspose ' Complex conjugate transpose.
cumprod Cumulative product of elements. (unary op)
cumsum Cumulative sum of elements. (unary op)
del2 Discrete Laplacian. (unary op)
delete Delete the file, if a temporary file was created for this variable
disp LAY Display array (long).
display Display array (short).
double SINGLE Convert tensor representation to double precision (float64).
end Last index in an indexing expression
eq == Equal. (binary op)
exp Exponential. (unary op)
fileparts Return the files associated with the data
find Find indices of nonzero elements. (unary op)
findobj Find objects matching specified conditions.
findprop Find property of MATLAB handle object.
floor Round towards minus infinity. (unary op)
fread Read binary data from file.
fwrite Write binary data from file.
ge >= Greater than or equal. (binary op)
get Get MATLAB object properties.
getdisp Specialized MATLAB object property display.
gt > Greater than. (binary op)
imag Complex imaginary part. (unary op)
int16 Convert tensor representation to signed 16-bit integer.
int32 Convert tensor representation to signed 32-bit integer.
int64 Convert tensor representation to signed 64-bit integer.
int8 Convert tensor representation to signed 8-bit integer.
ipermute Inverse permute array dimensions.
ischar True for character array (string).
isempty True for empty array.
isequal True if arrays are numerically equal. (binary op)
isfinite True for finite elements. (unary op)
isfloat True for floating point arrays, both single and double.
isinf True for infinite elements. (unary op)
isinteger True for arrays of integer data type.
islogical True for logical array.
ismatrix True if array is a matrix (not a scalar).
isnan True for Not-a-Number. (unary op)
isnumeric True for numeric arrays.
isreal True for real array.
isscalar True if array is a scalar.
isvalid Test handle validity.
ldivide .\ Left array divide. (binary op)
le <= Less than or equal. (binary op)
length Length of vector.
load Lazy loading from data files.
loadobj Load filter for objects.
log Natural logarithm. (unary op)
log10 Common (base 10) logarithm. (unary op)
logical UINT8 Convert tensor representation to logical (true/false).
lt < Less than. (binary op)
max Largest component.
mean Average or mean value. (unary op)
median Median value. (unary op)
min Smallest component.
minus - Minus. (binary op)
mldivide \ Backslash or left matrix divide. (binary op)
mpower ^ Matrix power. (binary op)
mrdivide / Slash or right matrix divide. (binary op)
mtimes * Matrix multiply. (binary op)
ndims Number of dimensions.
ne ~= Not equal. (binary op)
nonzeros Nonzero matrix elements. (unary op)
norm Matrix or tensor norm. (unary op)
not ~ Logical NOT. (unary op)
notify Notify listeners of event.
numel Number of objects in a vector. Use prod(size(M)) or numel2 for number of elements in an object.
numel2 NUMEL2 Number of elements in an array, same as prod(size(M))
or
pack Compress mapped data files
permute Permute array dimensions
plot Plot an array.
plus + Plus. (binary op)
power .^ Array power. (binary op)
prod Product of elements. (unary op)
rdivide ./ Right array divide. (binary op)
real Real part. (unary op)
reducevolume reduce an array size
reshape Reshape array.
round Round towards nearest integer. (unary op)
runtest runs a set of tests on object methods
saveobj Save filter for objects.
set Set MATLAB object property values.
setdisp Specialized MATLAB object property display.
sign Signum function. (unary op)
sin Sine of argument in radians. (unary op)
single Convert tensor representation to single precision (float32).
sinh Hyperbolic sine. (unary op)
size Get original tensor size, and extend dimensions if necessary
sqrt Square root. (unary op)
subsasgn Subscripted assignment
subsref Subscripted reference.
sum Sum of elements.
tan Tangent of argument in radians. (unary op)
tanh Hyperbolic tangent. (unary op)
times .* Array multiply. (binary op)
transpose .' Transpose.
uint16 Convert tensor representation to unsigned 16-bit integer.
uint32 Convert tensor representation to unsigned 32-bit integer.
uint64 Convert tensor representation to unsigned 64-bit integer.
uint8 Convert tensor representation to unsigned 8-bit integer.
uminus - Unary minus. (unary op)
unpack Decompress mapped data files
uplus + Unary plus (copyobj).
var Variance. (unary op)
version Return class version
xor Logical EXCLUSIVE OR. (binary op)

Examples

   % To create a mapped file for a given input array:
   % A temporary file is created to hold the data.
   m = MappedTensor(rand(100,100,100));

   % To reuse a previously existing mapped file:
   m = MappedTensor('records.dat', [100 100 100], ...
         'format','double', 'writable', true);
   m(:) = rand(100, 100, 100);  % assign new data
   m(1:2:end) = 0;

Publications

This work was published in Frontiers in Neuroinformatics: DR Muir and BM Kampa. 2015. FocusStack and StimServer: A new open source MATLAB toolchain for visual stimulation and analysis of two-photon calcium neuronal imaging data, Frontiers in Neuroinformatics 8 85. DOI: 10.3389/fninf.2014.00085. Please cite our publication in lieu of thanks, if you use this code.

This version of the code has been heavily revamped by emmanuel.farhi@synchrotron-soleil.fr. Please cite the following publication:

  • E. Farhi et al., J. Neut. Res., 17 (2013) 5. DOI: 10.3233/JNR-130001