Skip to content

Implementation in HDF5

A. Stoewer edited this page Jun 17, 2014 · 17 revisions

In this section we want to describe how the previously described model for data and metadata can be stored in a HDF5 file. HDF5 provides three elements that define the structure of the format. A group can contain zero or more group, dataset or attribute objects. An attribute in HDF5 is represented by a name and one ore more values of the same datatype. The probably most important structuring element in HDF5 is the dataset: it can contain arbitrary data in a multidimensional array and stores information about its dimensionality, data type and a name. A dataset may further contain zero or more attributes. Since all elements defined in HDF5 have a name, they can be referenced by a path.

Notation

To visualize the hierarchical structure of the HDF5 tree, the HDF elements are shown as a nested list with multiple indention levels. Each element is represented with its element type followed by ':' and the element name. Attributes can have a value assigned. Literals with enclosed by '<>' are place holders for the actual values:

  <type>:<name> = [<type>|<value>]

A group named 'foo' with an attribute named 'greeting' and the value 'hello world' is shown like this:

  group:foo
    attribute:greeting = 'hello world'

For datasets the following annotations are used:

  dataset:<name> = <type>[]            // dataset with one dimension
  dataset:<name> = <type>[][]          // dataset with two dimensions
  dataset:<name> = <type>[] ... n      // dataset with n dimensions
  
  dataset:foo = double[1.0, 2.0]       // dataset named 'foo' with one dimension of double
                                       // values and the content 1.0 and 2.0 

General rules

  • All HDF5 elements representing an entity of the model for data or metadata are named with their id.
  • All optional attributes that are empty or set to NULL have to be omitted.
  • 1 - n connections in the data model are represented in a nested, tree-like, structure in HDF5.
  • n - m connections are represented in a flat way.

File Structure

The root of a HDF5 file implementing this standard contains a set of attributes describing the version and name of the schema as well as time stamps with the creation date an the last update. Further the root contains two groups named metadata and data.

  root:
    attribute:format = string
    attribute:version = string
    attribute:created_at = date
    attribute:updated_at = date
    group:metadata
      ...
    group:data
      ...

A third group named 'terminology' is reserved for future use.

Data Model

All entities of the data model are stored under the path '/data' in the HDF5 file.

Block

Entities of the type Block can only be stored at the first hierarchy level at the '/data' path. Blocks always have the following structure.

  group:<block_id>
    attribute:type = string
    attribute:name = string
    attribute:definition = string
    attribute:date = date
    attribute:metadata = string
    group:sources
      // all sources in this block
    group:data_arrays
      // all data arrays in this block
    group:data_tags
      // all data tags in this block
    group:simple_tags
      // all simple tags in this block

DataArray

Entities of the type DataArray can only be defined in the group called 'data_arrays' of its respective Block. The path to a DataArray can therefore always be defined like this:

  /data/<block_id>/data_arrays/<data_array_id>

In HDF5 DataArray objects always have the following structure:

  group:<data_array_id>
    attribute:type = string
    attribute:name = string
    attribute:definition = string
    attribute:label = string
    attribute:unit = string
    attribute:metadata = string
    dataset:data = <data_type>[]...n
    dataset:sources = string[]
    group:dimensions
      // definition of all dimensions of the data array

The field 'data_type' that is defined in the model is defined becomes part of the dataspace definition of the hdf5 data dataset and thus is not represented in a distinct field. Native data types names will be mapped to the data types of HDF5 as defined in the following table:

Type name HDF5 data type
byte H5T_STD_I8LE
uint16 H5T_STD_U16LE
uint,uint32 H5T_STD_U32LE
int16 H5T_STD_I16LE
int, int32 H5T_STD_I32LE
long, int64 H5T_STD_I64LE
float H5T_IEEE_F32LE
double H5T_IEEE_F64LE
string H5T_C_S1

Dimensions

Dimensions Set, Range, and Sample can only be defined inside the group called 'dimensions' of their parent DataArray. The 'order' attribute of each dimension type defines the sequential arrangement of the dimension. The dimension with the lowest order value applies to the first dimension etc. The structure of all dimension entities is defined as described below:

  // Set
  group:<dimension_id>
    attribute:order: int
    attribute:dimension_type = 'set'
    dataset:labels = string[]

  // Range
  group:<dimension_id>
    attribute:order: int
    attribute:dimension_type = 'range'
    attribute:label = string
    attribute:unit = string
    dataset:tics = double[]

  // Sample
  group:<dimension_id>
    attribute:order: int
    attribute:dimension_type = 'sample'
    attribute:label = string
    attribute:unit = string
    attribute:sampling_interval = double
    attribute:offset = double

Tags

We define two types of tags which can be used to tag data in two different forms, Simple Tag and Data Tag.

Simple Tag

Entities of the type Simple Tag can only be defined inside first hierarchy level of the group 'simple_tags' of the parent block.
TODO the purpose of this entity The structure of a Simple Tag in a HDF5 file is defined as depicted below:

  group:<simple_tag_id>
    attribute:type = string
    attribute:name = string
    attribute:definition = string
    attribute:metadata = string
    dataset:sources = string[]
    dataset:units = string[]
    dataset:position = double[]
    dataset:extent = double[]
    dataset:references = string[]       
    group:features   

Data Tag

Entities of the type Data Tag can only be defined inside first hierarchy level of the group 'data_tags' of the parent block.
TODO the purpose of this entity The structure of a Data Tag in a HDF5 file is defined as depicted below:

  group:<data_tag_id>
    attribute:type = string
    attribute:name = string
    attribute:definition = string
    attribute:metadata = string
    dataset:sources = string[]
    dataset:positions = string     //references a Data_Array
    dataset:extents = string       //references a Data_Array
    dataset:references = string[]  //references Data_Arrays    
    group:features

Feature

The definition of Feature entities is strictly restricted to the group 'features' of parent Tag entities. The schema below describes the structure:

  group:<feature_id>
    attribute:link_type = enum{tagged, untagged, indexed}
    attribute:data = string

Source

Entities of the type Source can only be defined inside the group 'sources' of the parent Block entity.

  group:<source_id>
    attribute:type = string
    attribute:name = string
    attribute:definition = string
    attribute:metadata = string
    attribute:parent_source = string
    group:sources

Metadata model

Feature of the metadata model (odML) in HDF5. Metadata objects can only be located in the Metadata group that is a direct child of the root node. Sections are stored in a flat way, there is no hierarchy in the sections. Properties and Values, on the other hand, are stored as children of their parent sections, respectively their parent Properties.

Section

  group:<section_id>
    attribute:name = string
    attribute:type = string
    attribute:definition = string
    attribute:link = string
    attribute:include = string
    attribute:repository = string
    attribute:mapping = string
    attribute:parent_section = string
    group:properties
    group:sections

Property

  group:<property_id>
    attribute:name = string
    attribute:definition = string
    attribute:unit = string
    attribute:data_type = string
    attribute:mapping = string
    group:values

Value

The values dataset is a compound dataset that actually stores the values of a property.

  dataset:<value_id>
    member:value = property.data_type
    member:uncertainty = double
    member:filename = string
    member:checksum = string
    member:encoder = string
    member:reference = string

Schematic file structure overview

The following listing gives an example how the file structure would look like and indicates how one entity references another. Please note, that empty elements are omitted.

  root:
    attribute:format = 'pandora'
    attribute:version = '1.0'
    attribute:created_at = 2013-02-01
    attribute:updated_at = 2013-02-01

    group:metadata
      attribute:author = 'John Doe'
      attribute:date = '2013-02-01'
      attribute:version = '1.0'
      attribute:repository = 'http://www...'
      group:section_id01
        attribute:name = 'Session01'
        attribute:type = 'recording'
        group:properties
          group:property_id1
            attribute:name = 'Experimenter'
            attribute:data_type = 'string'
            dataset:values
              member:value = 'John Doe'
         group:property_id02
           ...
      group:section_id02
        attribute:name = 'Amplifier'
        attribute:type = 'hardware/amplifier'
          ...
      group:section_id03
        attribute:name = 'Animal01'
        attribute:type = 'subject'
          ...
        group:sections
          group:section_id04
            attribute:name = 'Cell01'
            attribute:type = 'cell'
            attribute:parent_section = 'section_id03'
            ...

    group:data
      block:block_id01

        group:data_arrays
          group:data_array_id01
            attribute:name = 'My first recording.'
            attribute:type = 'analog_signal'
            attribute:label = 'voltage'
            attribute:unit = 'mV'
            attribute:metadata = 'section_id01'
            dataset:data = <data_type>[]...n
            group:dimensions
              group:dimension_id01
                attribute:order = '0'
                attribute:dimension_type = 'sample'
                attribute:label = 'time'
                attribute:unit = 's'
                attribute:sampling_interval = '0.001'
                attribute:offset = '0.0'
          group:data_array_id02
            attribute:name = '1st spike waveform'
            attribute:type = 'analog_signal'
            attribute:definition = 'Waveform of the spike, plus 2 ms before spike'
            attribute:label = 'voltage'
            attribute:unit = 'mV'
            dataset:data = <data_type>[]...n 
            group:dimensions
              group:dimension_id02
                attribute:order = '0'
                attribute:dimension_type = 'sample'
                attribute:label = 'time'
                attribute:unit = 's'
                attribute:sampling_interval = 0.001
                attribute:offset = -0.002
        
        group:data_tags
           []
       
        group:simple_tags
          group:simple_tag_id01
            attribute:name = 'spike01'
            attribute:type = 'spiketime'
            dataset:position = [0.254]
            dataset:units = ['s']
            dataset:extent = [0.0]
            dataset:references = ['data_array_id01']       
            group:features
              group:feature_id01
                attribute:link_type = 'untagged'
                attribute:data = data_array_id02

        group:sources
          group:source_id01
            attribute:type = 'hardware'
            attribute:name = 'Amplifier'
            attribute:metadata = 'section_id02'
          group:source_id02
            attribute:type = 'subject'
            attribute:name = 'Animal01'
            attribute:metadata = 'section_id03'
            group:sources
              group:source_id03
                attribute:type = 'neuron'
                attribute:name = 'Cell01'
                attribute:metadata = 'section_id04'
                attribute:parent_source = 'source_id02'

      block:block_id02
        ...

Example

Clone this wiki locally