Data handling and record arrays #1053

alecandido · 2024-12-01T17:49:19Z

Most of the routines (especially sweeper-based ones) are handling acquired data by collecting them in arrays with structured data types, which are then dumped to disk.

This is often critical in two respects:

data dumps are not always just raw data, but they often contain the result of some mild post-processing (or partially supplemented with some parameters values)
there is quite some overhead connected to the management of record arrays, especially their creation

While 1. is also relevant, it may be explored by a different issue, as it is less technical, and more related to the individual protocol's structure.

Instead, I suspect that the second point is also related to a poor usage of the NumPy API for record arrays creation (which is fully wrapped by the np.rec.array constructor).
In particular:

when multiple arrays of components have to be packed together in a single array, the components may be constructed first, and then stacked just with np.rec.fromarrays (that can also be invoked through the common np.rec.array interface), which may alleviate or eliminate the need for something like Data.register_qubit
when loading data from the array, np.load is used, which requires to be wrapped in something like AbstractData.load_data, while, since the expected data type is always known by the routine (or just the related data structure), it should be sufficient to specify it as the dtype= argument in np.rec.fromfile (also accessible through the np.rec.array interface)

In general, we may reduce the custom handling of data by Qibocal, replacing it with more idiomatic usage of the NumPy API, possibly leading to a more vectorized treatment of data (fewer Python for loops), consequently reducing nesting (as functions, like Data/AbstractData methods, and blocks, i.e. the mentioned Python for loops).

The text was updated successfully, but these errors were encountered:

alecandido · 2024-12-01T17:51:50Z

@ElStabilini since you already hit the problem yourself, you may consider this (only after your current commitments), as a technical contribution.
It is not physics-related, but it may help you familiarize more with the library (and the NumPy API itself), while giving a help to simplify Qibocal itself, which is invaluable (assuming it's possible...).

ElStabilini · 2024-12-03T06:04:07Z

I'll for sure have a look as soon as I can!

alecandido added the enhancement New feature or request label Dec 1, 2024

ElStabilini self-assigned this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data handling and record arrays #1053

Data handling and record arrays #1053

alecandido commented Dec 1, 2024

alecandido commented Dec 1, 2024

ElStabilini commented Dec 3, 2024

Data handling and record arrays #1053

Data handling and record arrays #1053

Comments

alecandido commented Dec 1, 2024

alecandido commented Dec 1, 2024

ElStabilini commented Dec 3, 2024