Cleaned up the wording for HDF5 datatypes in the spec.

ArtifactDB · Jan 8, 2024 · dd2904f · dd2904f
1 parent 3695bd4
commit dd2904f
Showing 1 changed file with 7 additions and 7 deletions.
diff --git a/docs/specifications/hdf5.Rmd b/docs/specifications/hdf5.Rmd
@@ -85,15 +85,15 @@ if (.version == package_version("1.0")) {
 The group should contain an 1-dimensional dataset at `**/data`.
 Vectors of length 1 may also be represented as a scalar dataset.
 (While R makes no distinction between scalars and length-1 vectors, this may be useful for other frameworks where this difference is relevant.)
-The allowed HDF5 datatype depends on `uzuki_type`:
+The allowed HDF5 datatype for `**/data` depends on `uzuki_type`:
 
-- `"integer"`, `"boolean"`: any type of `H5T_INTEGER` that can be represented by a 32-bit signed integer.
-  Note that the converse is not required, i.e., the storage type does not need to be 32-bit if no such values are present in the dataset.
+- `"integer"`, `"boolean"`: a HDF5 integer datatype that can be exactly represented by a 32-bit signed integer.
+  Note that the converse is not required, i.e., the datatype does not need to be 32-bit if no such values are present in the dataset.
 ```{r, echo=FALSE, results="asis"}
 if (.version == package_version("1.0")) {
-    cat('- `"number"`: any type of `H5T_FLOAT` that can be represented by a double-precision float.')
+    cat('- `"number"`: a HDF5 float datatype that can be exactly represented by a double-precision float.')
 } else {
-    cat('- `"number"`: any type of `H5T_FLOAT` or `H5T_INTEGER` that can be represented exactly by a double-precision (64-bit) float.')
+    cat('- `"number"`: a HDF5 integer or float datatype that can be represented exactly by a double-precision (64-bit) float.')
 }
 ```
 - `"string"`: a HDF5 string datatype that can be represented by a UTF-8 encoded string.
@@ -181,14 +181,14 @@ if (.version == package_version("1.0")) {
   This should use a HDF5 string datatype that is compatible with the UTF-8 encoding.
 
 The group should contain an 1-dimensional dataset at `**/data`, containing 0-based indices into the levels.
-This should be type of `H5T_INTEGER` that can be represented by a 32-bit signed integer.
+This should be any HDF5 integer datatype that can be represented by a 32-bit signed integer.
 (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.)
 Missing values are represented as described above for atomic vectors.
 
 The group should also contain `**/levels`, a 1-dimensional string dataset that contains the levels for the indices in `**/data`.
 Values in `**/levels` should be unique.
 Values in `**/data` should be non-negative (missing values excepted) and less than the length of `**/levels`.
-Note that the type constraints on `**/data` suggest that there should not be more than 2147483647 levels;
+Note that the datatype constraints on `**/data` suggest that there should not be more than 2147483647 levels;
 beyond that count, the levels cannot be indexed by elements of `**/data`.
 `**/levels` should use a HDF5 string datatype that is compatible with the UTF-8 encoding.