Skip to content

Consider revamping the structure of the HDF5 files #78

@lucabaldini

Description

@lucabaldini

I'd like to revise the decision to use pytables vs. hd5py, and I think we got a few things wrong in the initial layout that I would like to think through properly.

  • we should probably store all the metadata as one or more tables, rather than fiddling with the attributes (I think this is coming from the experience with FITS headers, but it is not necessarily what you want to do in other formats)
  • by looking around, I came across this idea of storing variable-sized arrays in the form of a long, unique array storing the values, and another scalar, event-based array storing the offsets in the big array for each event.

Here is a suggestion from chatGPT

/meta/
    detector_config      (dataset or group of datasets)
    daq_settings         (dataset or group of datasets)
    run_info             (attrs: run_id, start_time, software_version, ...)

/events/
    trigger_id           (N,) int64
    timestamp            (N,) int64 or float64
    roi                  (N,4) int32  # e.g. [x0, y0, w, h] or [x_min, x_max, y_min, y_max]
    pha_offsets          (N+1,) int64 # prefix-sum offsets into flat pha_values
    pha_values           (M,) uint16/uint32  # concatenation of all per-event pixel PHAs
    # optional:
    pha_shape            (N,2) int16  # if ROI implies pixel count, may be redundant
    event_flags          (N,) uint32

/sim/
    truth/
        trigger_id       (N_sim,) int64   # or event_index
        ... scalar truth columns ...

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions