-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping a list of CMIP6 datasets for which the basic tutorial recipe does not work #105
Comments
Here is a failure:
What happens? The netcdf files cannot be opened without specifying
gives the following error:
Attempted Fix: Adding
throws another error:
|
Naomi do you understand what it is about this dataset that is causing the error? |
Actually, no, it is puzzling. It looks perfectly normal when using
and seems fine when downloading and then opening the downloaded copy with Here is the link to the file directly from esgf: cLeaf_Lmon_IPSL-CM6A-LR_abrupt-4xCO2_r1i1p1f1_gr_185001-214912.nc |
... and it opens fine using the opendap url:
so the only problem is with:
|
So it sounds like we have found an s3fs bug? pinging @martindurant, our fsspec superhero 🦸 |
Just to help get to the bottom of it, could you try the following. ds1 = ds.open_dataset(opendap_url, decode_coords=False).load()
ds2 = ds.open-dataset(s3_file, decode_coords=False).load()
ds1.identical(ds2) This should help surface whatever tiny difference is behind this problem. |
Logging from s3fs shows nothing more than a bunch of range requests for the given file, I don't see anything immediately suspicious |
The ds1.identical(ds2) returns False - but the only thing that is different in the basic information is an extra attribute in the opendap example, ds1, called 'DODS_EXTRA.Unlimited_Dimension: time'.
<xarray.Dataset>
<xarray.Dataset>
|
The problem is with the coordinates attribute ds2c = ds2.copy()
del ds2c.cLeaf.attrs['coordinates']
xr.decode_cf(ds2c) |
This looks like an h5py issue actually. |
I got to the bottom of this in the xarray issue above. It actually has nothing to do with fsspec. It's a difference between the netCDF4 engine and the h5netcdf engine in how they handle existing but empty attributes. One way to solve this would be to use |
That is a fix for the recipe, perhaps, but when I just want to |
Currently only h5netcdf can open files efficiently over http / s3 via fsspec. Your example has uncovered a bug in h5netcdf (pydata/xarray#5172). Therefore, you will not be able to do what you want (open the file remotely) until that bug is fixed. |
Another workaround would be to use a preprocessing function to drop the weird |
Yes, I like that suggestion. It will be easy to document in our future Ag-style database of pre-concatenation issues and fixes. |
According to this comment, the underlying issue discussed above should be resolved with Here's a notebook I ran today for @naomi-henderson, does is seem like the above-discussed issue is indeed resolved, or have I overlooked something? |
@cisaacstern , yes, it looks like this issue is resolved with the newer h5netcdf. Thanks. I would like to keep this issue open so we can continue to find examples of datasets with issues which can't be handled yet. Nice work with the CMIP6 tutorial, by the way. The CSV file actually has a column labeled |
It would be very helpful if those using, or attempting to use, the CMIP6 recipe would add their list of failures to this issue.
Some of these failures will not be fixable, but some will just need simple pre-processing or added xarray kwargs.
Hopefully we can keep track of them here.
The text was updated successfully, but these errors were encountered: