diff --git a/docs/specifications/hdf5.Rmd b/docs/specifications/hdf5.Rmd index 7431388..7548fd3 100644 --- a/docs/specifications/hdf5.Rmd +++ b/docs/specifications/hdf5.Rmd @@ -85,15 +85,15 @@ if (.version == package_version("1.0")) { The group should contain an 1-dimensional dataset at `**/data`. Vectors of length 1 may also be represented as a scalar dataset. (While R makes no distinction between scalars and length-1 vectors, this may be useful for other frameworks where this difference is relevant.) -The allowed HDF5 datatype depends on `uzuki_type`: +The allowed HDF5 datatype for `**/data` depends on `uzuki_type`: -- `"integer"`, `"boolean"`: any type of `H5T_INTEGER` that can be represented by a 32-bit signed integer. - Note that the converse is not required, i.e., the storage type does not need to be 32-bit if no such values are present in the dataset. +- `"integer"`, `"boolean"`: a HDF5 integer datatype that can be exactly represented by a 32-bit signed integer. + Note that the converse is not required, i.e., the datatype does not need to be 32-bit if no such values are present in the dataset. ```{r, echo=FALSE, results="asis"} if (.version == package_version("1.0")) { - cat('- `"number"`: any type of `H5T_FLOAT` that can be represented by a double-precision float.') + cat('- `"number"`: a HDF5 float datatype that can be exactly represented by a double-precision float.') } else { - cat('- `"number"`: any type of `H5T_FLOAT` or `H5T_INTEGER` that can be represented exactly by a double-precision (64-bit) float.') + cat('- `"number"`: a HDF5 integer or float datatype that can be represented exactly by a double-precision (64-bit) float.') } ``` - `"string"`: a HDF5 string datatype that can be represented by a UTF-8 encoded string. @@ -181,14 +181,14 @@ if (.version == package_version("1.0")) { This should use a HDF5 string datatype that is compatible with the UTF-8 encoding. The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels. -This should be type of `H5T_INTEGER` that can be represented by a 32-bit signed integer. +This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer. (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.) Missing values are represented as described above for atomic vectors. The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`. Values in `**/levels` should be unique. Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`. -Note that the type constraints on `**/data` suggest that there should not be more than 2147483647 levels; +Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels; beyond that count, the levels cannot be indexed by elements of `**/data`. `**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding.