Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to store (grid) data as NetCDF #905

Open
peanutfun opened this issue Jun 27, 2024 · 5 comments
Open

Add function to store (grid) data as NetCDF #905

peanutfun opened this issue Jun 27, 2024 · 5 comments

Comments

@peanutfun
Copy link
Member

Early versions of #898 #857 provided functions to store the exceedance map data as NetCDF files. We decided to remove this feature from the PR(s).

@DahyannAraya has also been working on NetCDF output of Impacts. It might be worthwhile to consolidate these efforts.

Minimal Viable Feature
Provide a function that "rasterizes" a GeoDataFrame into an xarray DataArray or Dataset, and writes its data into a NetCDF file. Throw an error if the data cannot be rasterized (apparently is not gridded).

@bguillod
Copy link
Collaborator

This would be useful feature indeed.

My suggestion would be to write method to convert data (exceedance, hazard, impact, or whatever object is applies to) to and from xarray (with fixed conventions for each class) and let xarray deal with the reading/writing of the file.

More specifically: I suggest there are methods to_xarray() and from_xarray (or similar) only. The user can then read/write that xarray using xarray itself (e.g., xarray.Dataset.to_netcdf or xarray.Dataset.to_hdf5 or xarray.Dataset.to_zarr, xarray.open_dataset, ...). This would leave the user the option to store data in its preferred format.

@peanutfun
Copy link
Member Author

@bguillod Thanks for the input. Indeed, xarray and its extensive functionality for storing and loading data seems to be the natural choice for that. As became apparent in #898, it is feasible to return computed data as GeoDataFrame. As Climada already uses this data structure extensively, we can start with a "conversion" from GeoDataFrame to xarray DataArray or Dataset and should cover most cases with that.

To consolidate any efforts already undertaken: Do you, by any chance, already have a code available that brings any Climada data structure into an xarray structure?

@hsteptoe
Copy link

I'm also very much in favour of an xarray interface. In the absence of any other volunteers I can probably find some to start this development...

@chahank
Copy link
Member

chahank commented Jul 10, 2024

That would be very much welcome! One thing to keep in mind overall is that xarray and netcdf do not really support sparse data. CLIMADA's success however relies on the use of sparse data structures for hazard events, impacts and exposures. If this can be somehow brought together, that would be very useful.

@peanutfun
Copy link
Member Author

peanutfun commented Jul 10, 2024

@DahyannAraya Please upload the functions you wrote for writing Impact to NetCDF here, as we just discussed personally 🙏

@chahank The feature request focuses on writing NetCDF data, not necessarily transforming Climada data structures into equivalent xarray structures and back. xarray will mostly be used as "proxy" to store the data. That being said, we already use xarray in conjunction with sparse within Hazard.from_raster_xarray, to store an intermediate, sparse representation of the raster data. Finally, NetCDF itself does not support sparse data structures. However, it supports "fill values" for missing data (typically NaN) and can store these very efficiently when compression is applied.

@hsteptoe Thanks a lot for volunteering! 😊 Let's clarify the exact feature request first, and then draft an implementation plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants