You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to have a workflow that could take in an NWB file that has already been saved to disk and repack it with recommended chunking and compression.
The first step would be to fetch the current backend configuration from existing Datasets. Maybe this could be a function in _dataset_configuration.py:
where the nwbfile must be linked to an on-disk NWB File. The backend should be automatically extracted, so no need to have that as a separate arg.
Then we should have a way to get the recommended configuration for that file. This works already in some cases with get_default_backend_configuration(nwbfile, backend), but does not work in all cases. If you have an ImageSeries with an external file and a (0,0,0) dataset, this triggers an error when the dataset in an h5py Dataset:
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[22], line 1
----> 1 get_default_backend_configuration(nwbfile, "hdf5")
File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_backend_configuration.py:19, in get_default_backend_configuration(nwbfile, backend)
16"""Fill a default backend configuration to serve as a starting point for further customization."""18 BackendConfigurationClass =BACKEND_CONFIGURATIONS[backend]
---> 19 return BackendConfigurationClass.from_nwbfile(nwbfile=nwbfile)
File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py:61, in BackendConfiguration.from_nwbfile(cls, nwbfile)
58@classmethod59deffrom_nwbfile(cls, nwbfile: NWBFile) -> Self:
60 default_dataset_configurations = get_default_dataset_io_configurations(nwbfile=nwbfile, backend=cls.backend)
---> 61 dataset_configurations = {
62 default_dataset_configuration.location_in_file: default_dataset_configuration
63for default_dataset_configuration in default_dataset_configurations
64 }
66returncls(dataset_configurations=dataset_configurations)
File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py:61, in <dictcomp>(.0)
58@classmethod59deffrom_nwbfile(cls, nwbfile: NWBFile) -> Self:
60 default_dataset_configurations = get_default_dataset_io_configurations(nwbfile=nwbfile, backend=cls.backend)
---> 61 dataset_configurations = {
62 default_dataset_configuration.location_in_file: default_dataset_configuration
63for default_dataset_configuration in default_dataset_configurations
64 }
66returncls(dataset_configurations=dataset_configurations)
File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_dataset_configuration.py:154, in get_default_dataset_io_configurations(nwbfile, backend)
151ifisinstance(candidate_dataset, np.ndarray) and candidate_dataset.size ==0:
152continue
--> 154 dataset_io_configuration = DatasetIOConfigurationClass.from_neurodata_object(
155 neurodata_object=neurodata_object, dataset_name=known_dataset_field
156 )
158yield dataset_io_configuration
File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py:272, in DatasetIOConfiguration.from_neurodata_object(cls, neurodata_object, dataset_name)
270 compression_method ="gzip"271elif dtype != np.dtype("object"):
--> 272 chunk_shape = SliceableDataChunkIterator.estimate_default_chunk_shape(
273 chunk_mb=10.0, maxshape=full_shape, dtype=np.dtype(dtype)
274 )
275 buffer_shape = SliceableDataChunkIterator.estimate_default_buffer_shape(
276 buffer_gb=0.5, chunk_shape=chunk_shape, maxshape=full_shape, dtype=np.dtype(dtype)
277 )
278 compression_method ="gzip"
File ~/dev/neuroconv/src/neuroconv/tools/hdmf.py:38, in GenericDataChunkIterator.estimate_default_chunk_shape(chunk_mb, maxshape, dtype)
35 chunk_bytes = chunk_mb *1e637 min_maxshape =min(maxshape)
---> 38 v = tuple(math.floor(maxshape_axis / min_maxshape) for maxshape_axis in maxshape)
39 prod_v = math.prod(v)
40while prod_v * itemsize > chunk_bytes and prod_v !=1:
File ~/dev/neuroconv/src/neuroconv/tools/hdmf.py:38, in <genexpr>(.0)
35 chunk_bytes = chunk_mb *1e637 min_maxshape =min(maxshape)
---> 38 v = tuple(math.floor(maxshape_axis / min_maxshape) for maxshape_axis in maxshape)
39 prod_v = math.prod(v)
40while prod_v * itemsize > chunk_bytes and prod_v !=1:
ZeroDivisionError: division by zero
We should either adjust this function so it works for this purpose, or create a different function for this specific purpose.
Then, finally, we would need a way to write this to a new file, probably using the export function in pynwb.
It would be nice to have two usage modes, one that completely automates everything, and one that would allow users to repack specific datasets with specific parameters.
This workflow should also allow the user to switch from one backend to another.
Is your feature request related to a problem?
It's somewhat common for other users to upload sub-optimal NWB files. This would also be a suitable workflow for when users create NWB Files in MatNWB and don't know how to configure the datasets properly there.
Do you have any interest in helping implement the feature?
@bendichter, I can't get this error with get_default_backend_configuration to replicate, so maybe it has been fixed in the time since you raised this issue?
What would you like to see added to NeuroConv?
It would be nice to have a workflow that could take in an NWB file that has already been saved to disk and repack it with recommended chunking and compression.
The first step would be to fetch the current backend configuration from existing Datasets. Maybe this could be a function in
_dataset_configuration.py
:get_existing_backend_configuration(nwbfile) -> BackendConfiguration
where the nwbfile must be linked to an on-disk NWB File. The backend should be automatically extracted, so no need to have that as a separate arg.
Then we should have a way to get the recommended configuration for that file. This works already in some cases with
get_default_backend_configuration(nwbfile, backend)
, but does not work in all cases. If you have anImageSeries
with an external file and a (0,0,0) dataset, this triggers an error when the dataset in an h5py Dataset:We should either adjust this function so it works for this purpose, or create a different function for this specific purpose.
Then, finally, we would need a way to write this to a new file, probably using the
export
function in pynwb.It would be nice to have two usage modes, one that completely automates everything, and one that would allow users to repack specific datasets with specific parameters.
This workflow should also allow the user to switch from one backend to another.
Is your feature request related to a problem?
It's somewhat common for other users to upload sub-optimal NWB files. This would also be a suitable workflow for when users create NWB Files in MatNWB and don't know how to configure the datasets properly there.
Do you have any interest in helping implement the feature?
No.
Code of Conduct
The text was updated successfully, but these errors were encountered: