[Feature]: repack NWB file #892

bendichter · 2024-06-09T16:46:27Z

What would you like to see added to NeuroConv?

It would be nice to have a workflow that could take in an NWB file that has already been saved to disk and repack it with recommended chunking and compression.

The first step would be to fetch the current backend configuration from existing Datasets. Maybe this could be a function in _dataset_configuration.py:

get_existing_backend_configuration(nwbfile) -> BackendConfiguration

where the nwbfile must be linked to an on-disk NWB File. The backend should be automatically extracted, so no need to have that as a separate arg.

Then we should have a way to get the recommended configuration for that file. This works already in some cases with get_default_backend_configuration(nwbfile, backend), but does not work in all cases. If you have an ImageSeries with an external file and a (0,0,0) dataset, this triggers an error when the dataset in an h5py Dataset:

from pynwb.testing.mock.file import mock_NWBFile
from neuroconv.tools.nwb_helpers import get_default_backend_configuration
from pynwb.image import ImageSeries

nwbfile = mock_NWBFile()

im_series = ImageSeries(
    name="my_video",
    external_file=["my_video.mp4"],
    starting_frame=[0],
    format="external",
    rate=30.0
)
nwbfile.add_acquisition(im_series)

from pynwb import NWBHDF5IO

with NWBHDF5IO("this_test4.nwb", "w") as io:
    io.write(nwbfile)

io = NWBHDF5IO("this_test4.nwb", "r")
nwbfile = io.read()

get_default_backend_configuration(nwbfile, "hdf5")

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[22], line 1
----> 1 get_default_backend_configuration(nwbfile, "hdf5")

File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_backend_configuration.py:19, in get_default_backend_configuration(nwbfile, backend)
     16 """Fill a default backend configuration to serve as a starting point for further customization."""
     18 BackendConfigurationClass = BACKEND_CONFIGURATIONS[backend]
---> 19 return BackendConfigurationClass.from_nwbfile(nwbfile=nwbfile)

File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py:61, in BackendConfiguration.from_nwbfile(cls, nwbfile)
     58 @classmethod
     59 def from_nwbfile(cls, nwbfile: NWBFile) -> Self:
     60     default_dataset_configurations = get_default_dataset_io_configurations(nwbfile=nwbfile, backend=cls.backend)
---> 61     dataset_configurations = {
     62         default_dataset_configuration.location_in_file: default_dataset_configuration
     63         for default_dataset_configuration in default_dataset_configurations
     64     }
     66     return cls(dataset_configurations=dataset_configurations)

File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py:61, in <dictcomp>(.0)
     58 @classmethod
     59 def from_nwbfile(cls, nwbfile: NWBFile) -> Self:
     60     default_dataset_configurations = get_default_dataset_io_configurations(nwbfile=nwbfile, backend=cls.backend)
---> 61     dataset_configurations = {
     62         default_dataset_configuration.location_in_file: default_dataset_configuration
     63         for default_dataset_configuration in default_dataset_configurations
     64     }
     66     return cls(dataset_configurations=dataset_configurations)

File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_dataset_configuration.py:154, in get_default_dataset_io_configurations(nwbfile, backend)
    151 if isinstance(candidate_dataset, np.ndarray) and candidate_dataset.size == 0:
    152     continue
--> 154 dataset_io_configuration = DatasetIOConfigurationClass.from_neurodata_object(
    155     neurodata_object=neurodata_object, dataset_name=known_dataset_field
    156 )
    158 yield dataset_io_configuration

File ~/dev/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py:272, in DatasetIOConfiguration.from_neurodata_object(cls, neurodata_object, dataset_name)
    270     compression_method = "gzip"
    271 elif dtype != np.dtype("object"):
--> 272     chunk_shape = SliceableDataChunkIterator.estimate_default_chunk_shape(
    273         chunk_mb=10.0, maxshape=full_shape, dtype=np.dtype(dtype)
    274     )
    275     buffer_shape = SliceableDataChunkIterator.estimate_default_buffer_shape(
    276         buffer_gb=0.5, chunk_shape=chunk_shape, maxshape=full_shape, dtype=np.dtype(dtype)
    277     )
    278     compression_method = "gzip"

File ~/dev/neuroconv/src/neuroconv/tools/hdmf.py:38, in GenericDataChunkIterator.estimate_default_chunk_shape(chunk_mb, maxshape, dtype)
     35 chunk_bytes = chunk_mb * 1e6
     37 min_maxshape = min(maxshape)
---> 38 v = tuple(math.floor(maxshape_axis / min_maxshape) for maxshape_axis in maxshape)
     39 prod_v = math.prod(v)
     40 while prod_v * itemsize > chunk_bytes and prod_v != 1:

File ~/dev/neuroconv/src/neuroconv/tools/hdmf.py:38, in <genexpr>(.0)
     35 chunk_bytes = chunk_mb * 1e6
     37 min_maxshape = min(maxshape)
---> 38 v = tuple(math.floor(maxshape_axis / min_maxshape) for maxshape_axis in maxshape)
     39 prod_v = math.prod(v)
     40 while prod_v * itemsize > chunk_bytes and prod_v != 1:

ZeroDivisionError: division by zero

We should either adjust this function so it works for this purpose, or create a different function for this specific purpose.

Then, finally, we would need a way to write this to a new file, probably using the export function in pynwb.

It would be nice to have two usage modes, one that completely automates everything, and one that would allow users to repack specific datasets with specific parameters.

This workflow should also allow the user to switch from one backend to another.

Is your feature request related to a problem?

It's somewhat common for other users to upload sub-optimal NWB files. This would also be a suitable workflow for when users create NWB Files in MatNWB and don't know how to configure the datasets properly there.

Do you have any interest in helping implement the feature?

No.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this bug was not already reported?

The text was updated successfully, but these errors were encountered:

pauladkisson · 2024-08-15T23:07:42Z

@bendichter, I can't get this error with get_default_backend_configuration to replicate, so maybe it has been fixed in the time since you raised this issue?

bendichter · 2024-08-15T23:13:23Z

yeah, maybe

bendichter added the enhancement label Jun 9, 2024

bendichter mentioned this issue Jul 18, 2024

CMD (idea): compress dandi/dandi-cli#21

Closed

pauladkisson linked a pull request Aug 13, 2024 that will close this issue

Repack Nwb Files #1003

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: repack NWB file #892

[Feature]: repack NWB file #892

bendichter commented Jun 9, 2024 •

edited

Loading

pauladkisson commented Aug 15, 2024 •

edited

Loading

bendichter commented Aug 15, 2024

[Feature]: repack NWB file #892

[Feature]: repack NWB file #892

Comments

bendichter commented Jun 9, 2024 • edited Loading

What would you like to see added to NeuroConv?

Is your feature request related to a problem?

Do you have any interest in helping implement the feature?

Code of Conduct

pauladkisson commented Aug 15, 2024 • edited Loading

bendichter commented Aug 15, 2024

bendichter commented Jun 9, 2024 •

edited

Loading

pauladkisson commented Aug 15, 2024 •

edited

Loading