diff --git a/docs/specifications/hdf5-1.0.md b/docs/specifications/hdf5-1.0.md index b8e261e..dc71201 100644 --- a/docs/specifications/hdf5-1.0.md +++ b/docs/specifications/hdf5-1.0.md @@ -96,16 +96,17 @@ A factor is represented as a HDF5 group (`**/`) with the following attributes: This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels. -This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer. +This should use a HDF5 integer datatype that can be represented by a 32-bit signed integer. (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.) Missing values are represented as described above for atomic vectors. -The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +The group should contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. Values in `**/levels` should be unique. + Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`. -Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels; -beyond that count, the levels cannot be indexed by elements of `**/data`. -`**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding. +Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels, +as beyond that, the levels cannot be indexed by elements of `**/data`. The group may also contain `**/names`, a 1-dimensional string dataset of length equal to `data`. This should use a HDF5 string datatype is compatible with the UTF-8 encoding. diff --git a/docs/specifications/hdf5-1.1.md b/docs/specifications/hdf5-1.1.md index 0d67156..ede11bb 100644 --- a/docs/specifications/hdf5-1.1.md +++ b/docs/specifications/hdf5-1.1.md @@ -97,16 +97,17 @@ A factor is represented as a HDF5 group (`**/`) with the following attributes: This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels. -This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer. +This should use a HDF5 integer datatype that can be represented by a 32-bit signed integer. (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.) Missing values are represented as described above for atomic vectors. -The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +The group should contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. Values in `**/levels` should be unique. + Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`. -Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels; -beyond that count, the levels cannot be indexed by elements of `**/data`. -`**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding. +Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels, +as beyond that, the levels cannot be indexed by elements of `**/data`. The group may also contain `**/names`, a 1-dimensional string dataset of length equal to `data`. This should use a HDF5 string datatype is compatible with the UTF-8 encoding. diff --git a/docs/specifications/hdf5-1.2.md b/docs/specifications/hdf5-1.2.md index d4fbbd0..1eeadd8 100644 --- a/docs/specifications/hdf5-1.2.md +++ b/docs/specifications/hdf5-1.2.md @@ -99,16 +99,17 @@ A factor is represented as a HDF5 group (`**/`) with the following attributes: This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels. -This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer. +This should use a HDF5 integer datatype that can be represented by a 32-bit signed integer. (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.) Missing values are represented as described above for atomic vectors. -The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +The group should contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. Values in `**/levels` should be unique. + Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`. -Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels; -beyond that count, the levels cannot be indexed by elements of `**/data`. -`**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding. +Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels, +as beyond that, the levels cannot be indexed by elements of `**/data`. The group may also contain `**/names`, a 1-dimensional string dataset of length equal to `data`. This should use a HDF5 string datatype is compatible with the UTF-8 encoding. diff --git a/docs/specifications/hdf5-1.3.md b/docs/specifications/hdf5-1.3.md index 2205e55..9fcb5fa 100644 --- a/docs/specifications/hdf5-1.3.md +++ b/docs/specifications/hdf5-1.3.md @@ -104,16 +104,17 @@ A factor is represented as a HDF5 group (`**/`) with the following attributes: This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels. -This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer. +This should use a HDF5 integer datatype that can be represented by a 32-bit signed integer. (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.) Missing values are represented as described above for atomic vectors. -The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +The group should contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. +This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. Values in `**/levels` should be unique. + Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`. -Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels; -beyond that count, the levels cannot be indexed by elements of `**/data`. -`**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding. +Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels, +as beyond that, the levels cannot be indexed by elements of `**/data`. The group may also contain `**/names`, a 1-dimensional string dataset of length equal to `data`. This should use a HDF5 string datatype is compatible with the UTF-8 encoding.