You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a case which need to load some h5 file into memory as a cache, the h5 file contains a lot of dataset, and a dataset contains thousand rows of 1d array.
Sorry that I can't provide such h5 file, but I can simulate a file which have similar structure in my use case.
use hdf5::{File,H5Type};use ndarray::{s,Array1};use std::collections::HashMap;use std::error::Error;use std::result::ResultasStdResult;#[derive(H5Type,Clone,PartialEq,Debug)]#[repr(C)]pubstructTmpData{a1:u32,a2:f64,a3:f64,a4:f64,a5:f64,a6:u64,}pubfnread_to_mem(path:&str) -> StdResult<HashMap<String,Array1<TmpData>>,Box<dynError>>{let file = File::open(path)?;// open for readingletmut result = HashMap::new();for dataset in file.datasets()? {let data = dataset.read_slice_1d::<TmpData,_>(s![..])?;
result.insert(dataset.name(), data);}Ok(result)}fnmain(){let _ = read_to_mem("tmp.h5").unwrap();}
I think this has popped up before (can't find the issue) and it was to do with hdf5 doing conversion of every compound internally, when it could have been a copy/noop. You could try creating a flamegraph to verify this.
h5py might be using a different way of reading the file compared to the naive way in this crate. We should look at this approach and copy their way of doing it.
Numpy structured arrays will produce packed layouts by default. You can check that .dtype.itemsize in your case is equal to 44, whereas for the Rust struct you have it's repr(C), so its sizeof will be 48. There's no surprise then, h5py does a direct read with zero work afterwards whereas in Rust you have mismatching layouts and you have to copy every field into its place. So, you'd want to do either:
Use align=True when creating recarrays, then you can use it with a repr(C) struct
Use repr(packed) on the struct, then you can use it with packed arrays
Hi, I have a case which need to load some h5 file into memory as a cache, the h5 file contains a lot of dataset, and a dataset contains thousand rows of 1d array.
Sorry that I can't provide such h5 file, but I can simulate a file which have similar structure in my use case.
Code to generate such h5 file
Here is my generated h5 file: tmp.tar.gz
Reader code
Here is rust code:
And here is python code:
As compared, hdf5-rust code takes
8m19s
to read the whole file, but h5py code takes about30 seconds
.I've tried to enable
f16
feature, but have no luck.Am I doing something wrong? Or how can I improve the performance?
The text was updated successfully, but these errors were encountered: