-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to write combined virtual dataset (for MUR SST) results in TypeError: Can only serialize wrapped arrays...
#60
Comments
*** TypeError: Can only serialize wrapped arrays of type ManifestArray, but got type <class 'numpy.ndarray'>
TypeError: Can only serialize wrapped arrays...
Thanks @abarciauskas-bgse !
This I hadn't thought about yet. Thoughts and PR's welcome (and it deserves a separate issue - #61).
Thanks for reporting that. Do I reproduce just by calling
The current code is supposed to work with v2, but there will be differences to smooth over (xref #17). Ideally I would be able to import classes directly from zarr-python to handle all of that.
That's this error, which @norlandrhagen also reported. It will require another upstream adjustment to xarray to fix. In the meantime you should be able to avoid it by not creating indexes (i.e. pass |
Apologies this is one issue which can probably now be separated into 3 issues, 2 of which are open. S3@TomNicholas Thanks for opening #61 I may take a closer look at how we could incorporate reading from S3 tomorrow. Filters typingI am getting pydantic errors when using open_virtual_dataset for this dataset. I made a change to the
Would it help if I created a separate issue for this error with a minimally reproducible example (via an artificially generated dataset, perhaps?) TypeError: Can only serialize wrapped arrays of type ManifestArray, but got type <class 'numpy.ndarray'>FWIW I get this error even when passing |
That would be awesome. Especially as it seems solving that issue should be quite separate from all the guts of the rest of the package.
That would certainly be the most correct way to move forward! But also if you think the fix is just a simple change of type hint then I'm happy to just accept a PR for that.
That's weird. Are you sure you're using both the most recent version of this package (i.e. You will get this error when your virtual dataset contains any arrays that are not |
@TomNicholas you were right, I had not correctly installed the forked branch of xarray in my testing. For future reference:
Once I had verified the forked version was installed, I ran the example again and following completes without error: from virtualizarr import open_virtual_dataset
import xarray as xr
# first get + set credentials from https://archive.podaac.earthdata.nasa.gov/s3credentials
vds1 = open_virtual_dataset(
's3://podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20210101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc',
# we have to put in the filetype to avoid trying to open the dataset with NetCDF4
filetype='netcdf4',
indexes={}
)
vds2 = open_virtual_dataset(
's3://podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20210102090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc',
filetype='netcdf4',
indexes={}
)
combined_vds = xr.concat([vds1, vds2], dim='time', coords='minimal', compat='override')
combined_vds['analysed_sst'].data.manifest.dict() # this works
combined_vds.virtualize.to_kerchunk('combined.json', format='json') |
Testing with a "real world dataset" (s3://podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1) mostly worked, with a few changes required, which are present in https://github.com/TomNicholas/VirtualiZarr/tree/ab/testing-mursst. Specifically:
filters
property onZArray
andCodecs
which was returned from this dataset as a list of dictionaries, not a string (a list of dicts appears to conform to the zarr v2 storage spec but I'm not sure if something changed in v3 or it's expected that the filters are encoded as a string. I changed the type to useList[Dict]
.With those changes in place, I was able to create the virtual zarr datasets, but when trying to write the combined reference to json, I got this error:
*** TypeError: Can only serialize wrapped arrays of type ManifestArray, but got type <class 'numpy.ndarray'>
which I haven't been able to figure out, yet.Here is my code to replicate:
The text was updated successfully, but these errors were encountered: