Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatree backend for opening grib files #11

Open
TomNicholas opened this issue Mar 8, 2024 · 1 comment
Open

datatree backend for opening grib files #11

TomNicholas opened this issue Mar 8, 2024 · 1 comment
Labels
enhancement New feature or request xarray Requires changes to xarray upstream

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Mar 8, 2024

Recently a way of kerchunking grib data as a DataTree object was added fsspec/kerchunk#399. Since the ongoing xarray-datatree integration is adding an open_datatree method to xarray's backendentrypoint classes, it's likely that we could make a open_datatree method that understands how to read a grib file and return a datatree containing ManifestArray objects.

@TomNicholas TomNicholas added the xarray Requires changes to xarray upstream label Mar 10, 2024
@TomNicholas TomNicholas added the enhancement New feature or request label Mar 26, 2024
@TomNicholas
Copy link
Member Author

We actually don't need to wait for anything upstream in xarray to occur before making something useful here. We could simply create a new virtualizarr.open_virtual_datatree function, which would detect the filetype, loop over the groups, and use open_virtual_datatree(/kerchunk directly if necessary) to first create the virtual xr.Dataset objects, then put them all into a datatree.Datatree to return. This function could be modelled after how datatree.open_datatree currently works.

At that point you would have a datatree.Datatree object wrapping lots of ManifestArray objects (let's call it vdt1 for "virtual datatree 1"). You could concatenate two such trees using

from datatree import map_over_subtree

combined_virtual_tree = datatree.map_over_subtree(xr.concat, vdt1, vdt2, dim=
'time')

(cc @maxrjones, who asked about doing something similar but for nested HDF5 files)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request xarray Requires changes to xarray upstream
Projects
None yet
Development

No branches or pull requests

1 participant