-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open kerchunk ref as virtual dataset, only json (from PR 119) #186
Open kerchunk ref as virtual dataset, only json (from PR 119) #186
Conversation
Most changes brought in from main branch to fix a few errors. Also a change so that if a parquet file is input the NotImplemented exception is raised, and a docstring added for the kerchunk filetype.
Two concerns as I've been working with this more:
|
This is likely kerchunk's fault. I've seen similar things before when kerchunk unpredictably violates it's own specification. If you can provide a reproducible example we can see exactly what the problem was and work around it in VirtualiZarr / fix it upstream in kerchunk.
@ayushnag also recently reported a bug in determining which variables are coordinate variables (#189), so you might be seeing that too. Again a reproducible example would help. |
@kthyng, as @TomNicholas points out, I also hit this issue in #187. You might try if my workaround helps. Unless you're opening a ref created by |
@TomNicholas Yeah sorry I didn't provide examples at the moment. I was at the end of my workday and just wanted to get some notes out to be clear I had found some issues in the PR.
# I am not clear why I can't just use the thredds file links directly (they work e.g. directly to xarray) so I downloaded them
# from the thredds file server https://thredds.cencoos.org/thredds/catalog/cencoos/ccsnrt/catalog.html
# otherwise the links would have been
# loc1 = "http://thredds.cencoos.org/thredds/dodsC/cencoos/ccsnrt/2011/2011_01/ccsnrt_2011_01_02.nc"
# loc2 = "http://thredds.cencoos.org/thredds/dodsC/cencoos/ccsnrt/2011/2011_01/ccsnrt_2011_01_03.nc"
loc1 = "ccsnrt_2011_01_02.nc"
loc2 = "ccsnrt_2011_01_03.nc"
vds1 = open_virtual_dataset(loc1, indexes={}, reader_options={})
vds2 = open_virtual_dataset(loc2, indexes={}, reader_options={})
combined_vds = xr.concat([vds1, vds2], dim='ocean_time', coords='minimal', compat='override', data_vars='minimal')
combined_vds gives correct coords both initially and after saving <xarray.Dataset> Size: 15MB
Dimensions: (time: 1, eta_v: 180, xi_v: 186, z: 11, eta_u: 181, xi_u: 185,
eta_rho: 181, xi_rho: 186)
Coordinates:
lat_v (eta_v, xi_v) float64 268kB ManifestArray<shape=(180, 186), dtyp...
lat_u (eta_u, xi_u) float64 268kB ManifestArray<shape=(181, 185), dtyp...
lon_u (eta_u, xi_u) float64 268kB ManifestArray<shape=(181, 185), dtyp...
lat_rho (eta_rho, xi_rho) float64 269kB ManifestArray<shape=(181, 186), ...
z (z) float64 88B ManifestArray<shape=(11,), dtype=float64, chunks...
lon_v (eta_v, xi_v) float64 268kB ManifestArray<shape=(180, 186), dtyp...
lon_rho (eta_rho, xi_rho) float64 269kB ManifestArray<shape=(181, 186), ...
time (time) float64 8B ManifestArray<shape=(1,), dtype=float64, chunk...
Dimensions without coordinates: eta_v, xi_v, eta_u, xi_u, eta_rho, xi_rho
Data variables: (12/13)
vbar (time, eta_v, xi_v) float64 268kB ManifestArray<shape=(1, 180, 1...
u (time, z, eta_u, xi_u) float32 1MB ManifestArray<shape=(1, 11, 1...
ubar (time, eta_u, xi_u) float64 268kB ManifestArray<shape=(1, 181, 1...
temp (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
rho (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
w (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
... ...
h (eta_rho, xi_rho) float64 269kB ManifestArray<shape=(181, 186), ...
v (time, z, eta_v, xi_v) float32 1MB ManifestArray<shape=(1, 11, 1...
urot (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
angle (eta_rho, xi_rho) float64 269kB ManifestArray<shape=(181, 186), ...
vrot (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
salt (time, z, eta_rho, xi_rho) float32 1MB ManifestArray<shape=(1, 1...
Attributes: (12/58)
ADM_LBC: \nEDGE: WEST SOUTH EAST NORTH \nz...
CF%3afeature_type: GRID
CPP_options: WC12_CCSRA, ADJOINT, ANA_BSFLUX, ANA_BTF...
Conventions: CF-1.4, _Coordinates
DODS_EXTRA.Unlimited_Dimension: ocean_time
NCO: "4.6.1"
... ...
summary: This West Coast ROMS model is one of two...
svn_rev: exported
svn_url: https://www.myroms.org/svn/src/trunk
tiling: 003x005
title: UCSC California Current System ROMS Nowc...
type: ROMS/TOMS history file then I save and reopen combined_vds.virtualize.to_kerchunk(f'combined.parq', format='parquet')
combined_ds = xr.open_dataset('combined.parq', engine="kerchunk")
combined_ds to get Dimensions: (eta_rho: 181, xi_rho: 186, eta_u: 181, xi_u: 185, eta_v: 180,
xi_v: 186, time: 1, z: 11)
Coordinates:
lat_rho (eta_rho, xi_rho) float64 269kB ...
lat_u (eta_u, xi_u) float64 268kB ...
lat_v (eta_v, xi_v) float64 268kB ...
lon_rho (eta_rho, xi_rho) float64 269kB ...
lon_u (eta_u, xi_u) float64 268kB ...
lon_v (eta_v, xi_v) float64 268kB ...
* time (time) datetime64[ns] 8B 2011-01-02
* z (z) float64 88B -250.0 -200.0 -150.0 -100.0 ... -10.0 -5.0 -2.0
Dimensions without coordinates: eta_rho, xi_rho, eta_u, xi_u, eta_v, xi_v
Data variables: (12/13)
angle (eta_rho, xi_rho) float64 269kB ...
h (eta_rho, xi_rho) float64 269kB ...
rho (time, z, eta_rho, xi_rho) float32 1MB ...
salt (time, z, eta_rho, xi_rho) float32 1MB ...
temp (time, z, eta_rho, xi_rho) float32 1MB ...
u (time, z, eta_u, xi_u) float32 1MB ...
... ...
urot (time, z, eta_rho, xi_rho) float32 1MB ...
v (time, z, eta_v, xi_v) float32 1MB ...
vbar (time, eta_v, xi_v) float64 268kB ...
vrot (time, z, eta_rho, xi_rho) float32 1MB ...
w (time, z, eta_rho, xi_rho) float32 1MB ...
zeta (time, eta_rho, xi_rho) float64 269kB ... Ok I have a second example that does not work out the same with coords vs. data_vars but it is pathological and I am giving up for now. I work with this model all the time but I can't get consistent behavior to show an example for this and also have a time error only sometimes pop up in this work (the cf-time thing fixes it but it wasn't necessary before and only now despite working with this model and virtualizarr for a few weeks) and now just hit the same |
Okay no worries @kthyng - sorry you've run into a few bugs here! Thanks for being intrepid enough to try VirtualiZarr out. Now we have reproducible examples we can track down these bugs and fix them, and I'll let you know once we've done that. |
Superceded by #251 |
This started from #119 and is meant to finish an MVP of that work. All of the work here is essentially from there, which @norlandrhagen did!
@TomNicholas merged main into that branch. Most changes here were brought in from main branch to fix a few leftover errors. Also a change so that if a parquet file is input the NotImplemented exception is raised, and a docstring added for the kerchunk filetype.
docs/releases.rst
api.rst