Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for coord_names in separate_coords #191

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ayushnag
Copy link
Contributor

@ayushnag
Copy link
Contributor Author

@TomNicholas this closes the issue however I think there is some existing functionality that can be refactored. My understanding is that the current logic is trying to find coordinates within attrs here. However this is accessing the dataset attributes whereas coordinates is within the attrs of each variable. When I debug the current code, coord_names is always empty. I can refactor the code and update this PR to remove the coord_names param from separate_coords and simply use the newly added search within separate_coords

@TomNicholas
Copy link
Member

TomNicholas commented Jul 21, 2024

I can refactor the code and update this PR to remove the coord_names param from separate_coords and simply use the newly added search within separate_coords

That would be great @ayushnag. I think the presence of the coord_names kwarg comes from a time when I didn't actually understand where I was supposed to get the information about which variable was a coordinate, so I had left it general. But it seems we don't need it, so let's get rid of it.

We should also do a few other things:

  • Strip the coordinates entry from the .zattrs (i.e. remove it from the xarray variable's .attrs) - this is redundant information once we have set the variable as an xarray coordinate.
  • Make sure the reverse process also works - saving a multi-dimensional coordinate to kerchunk json records the fact it should be a coordinate by re-adding an entry to the .zattrs.
  • Add a regression test, which should be another roundtripping test, which can use the xr.tutorial.open_dataset("ROMS_example.nc") you showed in open_virtual_dataset returns some coordinates as data variables #189.

@TomNicholas TomNicholas added bug Something isn't working CF conventions labels Jul 21, 2024
@TomNicholas
Copy link
Member

@ayushnag I would like to merge this important bugfix and issue a release of a new version of virtualizarr. Are you likely to have time to come back to this or should I merge this PR and open another?

@ayushnag
Copy link
Contributor Author

ayushnag commented Jul 30, 2024

@TomNicholas I was looking into this some more and it seems that the coord_names param is needed for non dimension coordinate cases (there is coordinate information in global dataset .zattrs) and we can't eliminate that function param. When I tried to remove it, the kerchunk roundtrip integration test failed

I unfortunately don't have time to implement your new suggestions but you can feel free to merge this or make an updated PR.

@TomNicholas TomNicholas added this to the v1.1.0 milestone Aug 1, 2024
TomNicholas added a commit that referenced this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CF conventions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_virtual_dataset returns some coordinates as data variables
2 participants