You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reading data from a Dataset, pyfive currently loads all chunks into memory before slicing the requested data. This behavior is inefficient when only a small region of the data is required which could be extracted from a small number or even a single chunk.
The code used for slicing dask arrays may be helpful for determining which chunks need to be read for the given slice.
The text was updated successfully, but these errors were encountered:
I was looking into pyfive to read cloud-hosted data (we have Python file objects for S3, GCS, Azure) and was sad to learn that slicing doesn't happen cleverly.
When reading data from a Dataset,
pyfive
currently loads all chunks into memory before slicing the requested data. This behavior is inefficient when only a small region of the data is required which could be extracted from a small number or even a single chunk.The code used for slicing dask arrays may be helpful for determining which chunks need to be read for the given slice.
The text was updated successfully, but these errors were encountered: