Skip to content

Commit

Permalink
Cleaned up the wording for HDF5 datatypes in the spec.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Jan 8, 2024
1 parent 3695bd4 commit dd2904f
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions docs/specifications/hdf5.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,15 @@ if (.version == package_version("1.0")) {
The group should contain an 1-dimensional dataset at `**/data`.
Vectors of length 1 may also be represented as a scalar dataset.
(While R makes no distinction between scalars and length-1 vectors, this may be useful for other frameworks where this difference is relevant.)
The allowed HDF5 datatype depends on `uzuki_type`:
The allowed HDF5 datatype for `**/data` depends on `uzuki_type`:

- `"integer"`, `"boolean"`: any type of `H5T_INTEGER` that can be represented by a 32-bit signed integer.
Note that the converse is not required, i.e., the storage type does not need to be 32-bit if no such values are present in the dataset.
- `"integer"`, `"boolean"`: a HDF5 integer datatype that can be exactly represented by a 32-bit signed integer.
Note that the converse is not required, i.e., the datatype does not need to be 32-bit if no such values are present in the dataset.
```{r, echo=FALSE, results="asis"}
if (.version == package_version("1.0")) {
cat('- `"number"`: any type of `H5T_FLOAT` that can be represented by a double-precision float.')
cat('- `"number"`: a HDF5 float datatype that can be exactly represented by a double-precision float.')
} else {
cat('- `"number"`: any type of `H5T_FLOAT` or `H5T_INTEGER` that can be represented exactly by a double-precision (64-bit) float.')
cat('- `"number"`: a HDF5 integer or float datatype that can be represented exactly by a double-precision (64-bit) float.')
}
```
- `"string"`: a HDF5 string datatype that can be represented by a UTF-8 encoded string.
Expand Down Expand Up @@ -181,14 +181,14 @@ if (.version == package_version("1.0")) {
This should use a HDF5 string datatype that is compatible with the UTF-8 encoding.

The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels.
This should be type of `H5T_INTEGER` that can be represented by a 32-bit signed integer.
This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer.
(Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.)
Missing values are represented as described above for atomic vectors.

The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`.
Values in `**/levels` should be unique.
Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`.
Note that the type constraints on `**/data` suggest that there should not be more than 2147483647 levels;
Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels;
beyond that count, the levels cannot be indexed by elements of `**/data`.
`**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding.

Expand Down

0 comments on commit dd2904f

Please sign in to comment.