Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data handling and record arrays #1053

Open
alecandido opened this issue Dec 1, 2024 · 2 comments
Open

Data handling and record arrays #1053

alecandido opened this issue Dec 1, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@alecandido
Copy link
Member

Most of the routines (especially sweeper-based ones) are handling acquired data by collecting them in arrays with structured data types, which are then dumped to disk.

This is often critical in two respects:

  1. data dumps are not always just raw data, but they often contain the result of some mild post-processing (or partially supplemented with some parameters values)
  2. there is quite some overhead connected to the management of record arrays, especially their creation

While 1. is also relevant, it may be explored by a different issue, as it is less technical, and more related to the individual protocol's structure.

Instead, I suspect that the second point is also related to a poor usage of the NumPy API for record arrays creation (which is fully wrapped by the np.rec.array constructor).
In particular:

  • when multiple arrays of components have to be packed together in a single array, the components may be constructed first, and then stacked just with np.rec.fromarrays (that can also be invoked through the common np.rec.array interface), which may alleviate or eliminate the need for something like Data.register_qubit
  • when loading data from the array, np.load is used, which requires to be wrapped in something like AbstractData.load_data, while, since the expected data type is always known by the routine (or just the related data structure), it should be sufficient to specify it as the dtype= argument in np.rec.fromfile (also accessible through the np.rec.array interface)

In general, we may reduce the custom handling of data by Qibocal, replacing it with more idiomatic usage of the NumPy API, possibly leading to a more vectorized treatment of data (fewer Python for loops), consequently reducing nesting (as functions, like Data/AbstractData methods, and blocks, i.e. the mentioned Python for loops).

@alecandido
Copy link
Member Author

@ElStabilini since you already hit the problem yourself, you may consider this (only after your current commitments), as a technical contribution.
It is not physics-related, but it may help you familiarize more with the library (and the NumPy API itself), while giving a help to simplify Qibocal itself, which is invaluable (assuming it's possible...).

@alecandido alecandido added the enhancement New feature or request label Dec 1, 2024
@ElStabilini ElStabilini self-assigned this Dec 3, 2024
@ElStabilini
Copy link
Contributor

I'll for sure have a look as soon as I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants