Add reference HDF recipe type #174

martindurant · 2021-07-21T18:13:47Z

No description provided.

rabernat · 2021-07-27T14:12:24Z

The existing test fixtures, e.g.

Lines 97 to 99 in e1749b7

    
           @pytest.fixture(scope="session", params=["D", "2D"]) 
        
           def netcdf_local_paths(daily_xarray_dataset, tmpdir_factory, request): 
        
               """Return a list of paths pointing to netcdf files."""

Provide netCDF files that you could use to test this recipe...whenever you're ready.

Let me know how I can help here.

martindurant · 2021-08-02T18:03:52Z

Note the interesting facet here: the recipe produces both a combined JSON file for the dataset, and an intake YAML file that knows how to load it.

(NB: TODOs could become issues after merger, but not ones that I intend to work on immediately)

martindurant · 2021-08-09T13:14:29Z

@rabernat , any thoughts here?
I'd like to get this in and have a go at making a recipe - which recipe, in turn, should be a good test case should we factor out paths and chunk handling logic.

rabernat

This looks like a great start!

My main observation is that you are assuming access to a shared global filesystem. This work for the test environment but won't work in cloud.

It would be great if you could push all i/o through the classes and function in storage.py, e.g. MetadataTarget, etc. That will allow these recipes to easily plug in to the bakeries. I know this may seem inconvenient, but I think there are advantages to having all i/o happen in one place.

rabernat · 2021-08-09T14:02:11Z

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

+
+    def iter_chunks(self) -> Iterable[Hashable]:
+        """Iterate over all target chunks."""
+        return fsspec.open_files(self.netcdf_url, **self.netcdf_storage_options)


What does this return? Open file objects?

Would nice to plug in FilePattern here.

It returns an OpenFiles instance, which will open all contained files when encountering a with, or can be iterated to expose individual OpenFile instances.
What would we do with a FilePattern? We don't have logic to infer coordinates or add them into the chunking scheme. It would be nice in the long run, but tricky.

What would we do with a FilePattern?

Mostly the point is to align with the rest of Pangeo Forge.

I would recommend using this instead

pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py

Lines 882 to 884 in c486771

def iter_inputs(self) -> Iterator[InputKey]:

for input_key in self.file_pattern:

yield input_key

and then using the input key to look up the url from the FilePattern object, e.g.

pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py

Line 167 in c486771

fname = file_pattern[input_key]

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

martindurant · 2021-08-09T14:19:40Z

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

+
+    def iter_chunks(self) -> Iterable[Hashable]:
+        """Iterate over all target chunks."""
+        return fsspec.open_files(self.netcdf_url, **self.netcdf_storage_options)


It returns an OpenFiles instance, which will open all contained files when encountering a with, or can be iterated to expose individual OpenFile instances.
What would we do with a FilePattern? We don't have logic to infer coordinates or add them into the chunking scheme. It would be nice in the long run, but tricky.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

mend

rabernat · 2021-08-23T18:28:28Z

FYI working on this today.

rabernat · 2021-08-24T12:50:57Z

@martindurant - I'm working on making a tutorial example for this recipe. I decided to use the CMIP6 netCDF files stored in this directory: s3://esgf-world/CMIP6/OMIP/NOAA-GFDL/GFDL-CM4/omip1/r1i1p1f1/Omon/thetao/gr/v20180701/. I've successfully generated the reference file, but I can't read it back using intake. I'm wondering if you could help me debug this.

The files are here

import s3fs
fs = s3fs.S3FileSystem(anon=True)
base_path = 's3://esgf-world/CMIP6/OMIP/NOAA-GFDL/GFDL-CM4/omip1/r1i1p1f1/Omon/thetao/gr/v20180701/'
all_paths = fs.ls(base_path)

I'm creating the following FilePattern

from pangeo_forge_recipes.patterns import pattern_from_file_sequence
pattern = pattern_from_file_sequence(['s3://' + path for path in all_paths], 'time')

And the following recipe

from pangeo_forge_recipes.recipes.reference_hdf_zarr import ReferenceHDFRecipe
rec = ReferenceHDFRecipe(
    pattern,
    xarray_concat_args={"dim": "time"}  # note that this could be discovered automatically from the pattern
)

I assigned storage as follows

import tempfile
from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.storage import FSSpecTarget, MetadataTarget

fs_local = LocalFileSystem()

cache_dir = tempfile.TemporaryDirectory()
metadata_cache = MetadataTarget(fs_local, cache_dir.name)

target_dir = tempfile.TemporaryDirectory()
target = FSSpecTarget(fs_local, target_dir.name)

rec.target = target
rec.metadata_cache = metadata_cache

And executed as follows

from dask.diagnostics import ProgressBar
delayed = rec.to_dask()
with ProgressBar():
    # took 1 hr 6 min on my computer 😱
    delayed.compute()

The files it produced (reference.json and reference.yaml) are saved in this gist.

I am able to open the file using fsspec + xarray

import fsspec
m = fsspec.get_mapper(
    "reference://",
    fo=target._full_path("reference.json"),
    target_protocol="file",
    remote_protocol="s3",
    skip_instance_cache=True,
)
ds = xr.open_dataset(m, engine='zarr', backend_kwargs={'consolidated': False}, chunks={})
print(ds)

<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1708-01-17 03:30:00 ... 1720-07-17 08:30:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(240, 180, 2), meta=np.ndarray>
    lev_bnds   (time, lev, bnds) float64 dask.array<chunksize=(240, 35, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(240, 360, 2), meta=np.ndarray>
    thetao     (time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 90, 180), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
Attributes: (12/45)
    Conventions:           CF-1.7 CMIP-6.0 UGRID-1.0
    _nc3_strict:           1
    activity_id:           OMIP
    branch_method:         none provided
    branch_time_in_child:  0.0
    comment:               Experiment name = OM4p25_IAF_BLING_csf_rerun\nFor ...
    ...                    ...
    table_id:              Omon
    title:                 NOAA GFDL GFDL-CM4 model output prepared for CMIP6...
    tracking_id:           hdl:21.14100/97e4edf3-22e7-4e5f-831a-f2a671b7094f
    variable_id:           thetao
    variant_info:          N/A
    variant_label:         r1i1p1f1

🎉 This is great!

However, I can't get the intake method to work. For reference, the intake catalog is just

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: /var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmp69uw_f7l/reference.json
        remote_options: {}
        remote_protocol: s3
        skip_instance_cache: true
        target_options: {}
        target_protocol: file
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource

import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
cat.data.read()

Gives this error

ValueError: cannot reshape array of size 360 into shape (240,180,2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/ipykernel_17531/3745362929.py in <module>
      1 import intake
      2 cat = intake.open_catalog(target._full_path("reference.yaml"))
----> 3 cat.data.read()

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/intake_xarray/base.py in read(self)
     38         """Return a version of the xarray with all the data in memory"""
     39         self._load_metadata()
---> 40         return self._ds.load()
     41 
     42     def read_chunked(self):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py in load(self, **kwargs)
    863 
    864             # evaluate all the dask arrays simultaneously
--> 865             evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    866 
    867             for k, data in zip(lazy_data, evaluated_data):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    566         postcomputes.append(x.__dask_postcompute__())
    567 
--> 568     results = schedule(dsk, keys, **kwargs)
    569     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    570 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    515                             _execute_task(task, data)  # Re-execute locally
    516                         else:
--> 517                             raise_exception(exc, tb)
    518                     res, worker_id = loads(res_info)
    519                     state["cache"][key] = res

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    323     if exc.__traceback__ is not tb:
    324         raise exc.with_traceback(tb)
--> 325     raise exc
    326 
    327 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    221     try:
    222         task, data = loads(task_info)
--> 223         result = _execute_task(task, data)
    224         id = get_id()
    225         result = dumps((result, id))

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/array/core.py in getter(a, b, asarray, lock)
    103         # `is_arraylike` evaluates to `True` in that case.
    104         if asarray and (not is_arraylike(c) or isinstance(c, np.matrix)):
--> 105             c = np.asarray(c)
    106     finally:
    107         if lock:

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    352 
    353     def __array__(self, dtype=None):
--> 354         return np.asarray(self.array, dtype=dtype)
    355 
    356     def __getitem__(self, key):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    516 
    517     def __array__(self, dtype=None):
--> 518         return np.asarray(self.array, dtype=dtype)
    519 
    520     def __getitem__(self, key):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    417     def __array__(self, dtype=None):
    418         array = as_indexable(self.array)
--> 419         return np.asarray(array[self.key], dtype=None)
    420 
    421     def transpose(self, order):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/coding/variables.py in __array__(self, dtype)
     68 
     69     def __array__(self, dtype=None):
---> 70         return self.func(self.array)
     71 
     72     def __repr__(self):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/coding/variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    135 ) -> np.ndarray:
    136     """Mask all matching values in a NumPy arrays."""
--> 137     data = np.asarray(data, dtype=dtype)
    138     condition = False
    139     for fv in encoded_fill_values:

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    417     def __array__(self, dtype=None):
    418         array = as_indexable(self.array)
--> 419         return np.asarray(array[self.key], dtype=None)
    420 
    421     def transpose(self, order):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py in __getitem__(self, key)
     73         array = self.get_array()
     74         if isinstance(key, indexing.BasicIndexer):
---> 75             return array[key.tuple]
     76         elif isinstance(key, indexing.VectorizedIndexer):
     77             return array.vindex[

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in __getitem__(self, selection)
    660 
    661         fields, selection = pop_fields(selection)
--> 662         return self.get_basic_selection(selection, fields=fields)
    663 
    664     def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields)
    785                                                 fields=fields)
    786         else:
--> 787             return self._get_basic_selection_nd(selection=selection, out=out,
    788                                                 fields=fields)
    789 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _get_basic_selection_nd(self, selection, out, fields)
    828         indexer = BasicIndexer(selection, self)
    829 
--> 830         return self._get_selection(indexer=indexer, out=out, fields=fields)
    831 
    832     def get_orthogonal_selection(self, selection, out=None, fields=None):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
   1123             # allow storage to get multiple items at once
   1124             lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1125             self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
   1126                                  drop_axes=indexer.drop_axes, fields=fields)
   1127 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   1837         for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   1838             if ckey in cdatas:
-> 1839                 self._process_chunk(
   1840                     out,
   1841                     cdatas[ckey],

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
   1743         except ArrayIndexError:
   1744             cdata = cdata.read_full()
-> 1745         chunk = self._decode_chunk(cdata)
   1746 
   1747         # select data from chunk

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _decode_chunk(self, cdata, start, nitems, expected_shape)
   1990         # ensure correct chunk shape
   1991         chunk = chunk.reshape(-1, order='A')
-> 1992         chunk = chunk.reshape(expected_shape or self._chunks, order=self._order)
   1993 
   1994         return chunk

ValueError: cannot reshape array of size 360 into shape (240,180,2)

Any idea what might be going on?

martindurant · 2021-08-24T13:11:44Z

I think I did exactly the same as you and got reasonable output:

In [16]: s = intake_xarray.xzarr.ZarrSource("reference://", storage_options=dict(fo="reference.json", remote_protocol="s3", chunks={}), consolidated=False)

In [17]: s.to_dask()
Out[17]:
<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1708-01-17 03:30:00 ... 1720-07-17 08:30:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(240, 180, 2), meta=np.ndarray>
    lev_bnds   (time, lev, bnds) float64 dask.array<chunksize=(240, 35, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(240, 360, 2), meta=np.ndarray>
    thetao     (time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 90, 180), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
Attributes: (12/45)
    Conventions:           CF-1.7 CMIP-6.0 UGRID-1.0
    _nc3_strict:           1
    activity_id:           OMIP
    branch_method:         none provided
    branch_time_in_child:  0.0
    comment:               Experiment name = OM4p25_IAF_BLING_csf_rerun\nFor ...
    ...                    ...
    table_id:              Omon
    title:                 NOAA GFDL GFDL-CM4 model output prepared for CMIP6...
    tracking_id:           hdl:21.14100/97e4edf3-22e7-4e5f-831a-f2a671b7094f
    variable_id:           thetao
    variant_info:          N/A
    variant_label:         r1i1p1f1

or with the catalog file (having edited the url to "{{ CATALOG_DIR }}/reference.json", which is, in retrospect, what makes sense anyway).

In [30]: cat = intake.open_catalog("reference.yml")

In [31]: cat.data.to_dask()
Out[31]:
<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
...

Doubtless my environment doesn't quite match yours?

rabernat · 2021-08-24T13:21:00Z

Thanks for the quick reply Martin!

Yes, this does work for me. 👍

s = intake_xarray.xzarr.ZarrSource(
    "reference://",
    storage_options=dict(
        fo=target._full_path("reference.json"),
        remote_protocol="s3",
        chunks={}
    ),
    consolidated=False)
s.to_dask()

And so does this!

import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
cat.data.to_dask()

The difference is that before I was calling cat.data.read() This suggests that there may be some more subtle error hiding in the chunks.

rabernat · 2021-08-24T13:22:59Z

And I found it...

import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
ds = cat.data.to_dask()
ds.lat_bnds.values
# -> ValueError: cannot reshape array of size 360 into shape (240,180,2)

martindurant · 2021-08-24T13:43:59Z

OK, so there is a bug in reference-maker combine:
the lat_bnds (for example) were not a function of time. When combined, it might have been reasonable to have it become a function of time (time, lat, bnds) and this would be the right thing to do for some variables (particularly if there is ONE time per file, and no other variable is a function of time).

In this case, the variable is constant and should not have gained the extra dimension. In MultiZarrToZarr, the variable should have fallen under case c), I can run debug to see why that didn't happen - do you have the original reference files around?

A note:
(360,) can indeed be reshaped and broadcast into (240, 180, 2), seeing 180x2. That's what I would have expected to happen

martindurant · 2021-08-24T18:08:46Z

I think that in this specific case, you needed the extra argument at the creation of the aggregate JSON file.

In general, you can indeed override any specific arguments when instantiating an intake source.

ds = cat.data(chunks={"time": 1200}).to_dask()

Unfortunately, if you wanted to update an argument that was embedded within another structure, such as one of the components of storage_options, it would take more work - there is no obvious API for this.

cisaacstern · 2021-08-24T19:03:24Z

Will this close #70?

martindurant · 2021-08-24T19:27:33Z

@cisaacstern - yes and no. This kind of recipe certainly is one way of generating metadata, as described in the issue. I still think it's a valid discussion, though, unless we can convince ourselves that reference-maker does/can give all the metadata that we might want.

review-notebook-app · 2021-08-25T14:33:10Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rabernat · 2021-08-25T14:43:54Z

Just a note on some changes I've made: I feel like don't want to have optional dependencies for pangeo-forge-recipes right now. This introduces unnecessary complexity in our test suite and CI. So here I would just prefer to have fsspec-reference-maker, intake, and h5py become required dependencies. How does that sound?

martindurant · 2021-08-25T14:50:03Z

Intake/intake-xarray is only required for testing here, so it doesn't need to be in the formal requirements.

You should be warned that the long-term for reference-maker is to support many file types (e.g., grib2), and you probably don't want to add all the dependencies. h5py should be the most common one, though, and we already use that. reference-maker will probably not depend on the various format-specific packages directly, only xarray, zarr.

martindurant · 2021-08-25T15:58:19Z

ci/py3.8.yml

@@ -40,4 +40,4 @@ dependencies:
  - pip
  - pip:
    - pytest-timeout
-    - git+https://github.com/intake/fsspec-reference-maker
+    - sspec-reference-maker


missing "f"

rabernat · 2021-08-25T16:37:29Z

There is some weird stuff in the rtd build...

  Downloading prefect-0.5.2-py3-none-any.whl (219 kB)
  Downloading prefect-0.5.1-py3-none-any.whl (217 kB)
  Downloading prefect-0.5.0-py3-none-any.whl (208 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
Collecting fsspec
  Downloading fsspec-2021.6.1-py3-none-any.whl (115 kB)
INFO: pip is looking at multiple versions of marshmallow to determine which version is compatible with other requirements. This could take a while.
  Downloading fsspec-2021.6.0-py3-none-any.whl (114 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  Downloading fsspec-2021.5.0-py3-none-any.whl (111 kB)
  Downloading fsspec-2021.4.0-py3-none-any.whl (108 kB)
  Downloading fsspec-0.9.0-py3-none-any.whl (107 kB)
  Downloading fsspec-0.8.7-py3-none-any.whl (103 kB)
  Downloading fsspec-0.8.6-py3-none-any.whl (103 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
  Downloading fsspec-0.8.5-py3-none-any.whl (98 kB)
  Downloading fsspec-0.8.4-py3-none-any.whl (91 kB)
  Downloading fsspec-0.8.3-py3-none-any.whl (88 kB)
  Downloading fsspec-0.8.2-py3-none-any.whl (87 kB)
  Downloading fsspec-0.8.1-py3-none-any.whl (87 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  Downloading fsspec-0.8.0-py3-none-any.whl (85 kB)
  Downloading fsspec-0.7.4-py3-none-any.whl (75 kB)
  Downloading fsspec-0.7.3-py3-none-any.whl (70 kB)
  Downloading fsspec-0.7.2-py3-none-any.whl (67 kB)
  Downloading fsspec-0.7.1-py3-none-any.whl (66 kB)
  Downloading fsspec-0.7.0-py3-none-any.whl (66 kB)
  Downloading fsspec-0.6.3-py3-none-any.whl (65 kB)
  Downloading fsspec-0.6.2-py3-none-any.whl (62 kB)
  Downloading fsspec-0.6.1-py3-none-any.whl (62 kB)
  Downloading fsspec-0.6.0-py3-none-any.whl (61 kB)
  Downloading fsspec-0.5.2.tar.gz (64 kB)
  Downloading fsspec-0.5.1-py3-none-any.whl (56 kB)
  Downloading fsspec-0.4.5.tar.gz (61 kB)
  Downloading fsspec-0.4.4.tar.gz (58 kB)
  Downloading fsspec-0.4.3.tar.gz (58 kB)
  Downloading fsspec-0.4.2.tar.gz (58 kB)
  Downloading fsspec-0.4.1.tar.gz (57 kB)
  Downloading fsspec-0.4.0.tar.gz (57 kB)
  Downloading fsspec-0.3.6.tar.gz (57 kB)
  Downloading fsspec-0.3.5.tar.gz (57 kB)
  Downloading fsspec-0.3.4.tar.gz (57 kB)
  Downloading fsspec-0.3.3.tar.gz (56 kB)
  Downloading fsspec-0.3.2.tar.gz (56 kB)
  Downloading fsspec-0.3.1.tar.gz (56 kB)
  Downloading fsspec-0.3.0.tar.gz (54 kB)
  Downloading fsspec-0.2.3.tar.gz (54 kB)
  Downloading fsspec-0.2.2.tar.gz (54 kB)
  Downloading fsspec-0.2.1.tar.gz (51 kB)
  Downloading fsspec-0.2.0.tar.gz (47 kB)
  Downloading fsspec-0.1.4.tar.gz (46 kB)
  Downloading fsspec-0.1.3.tar.gz (46 kB)
  Downloading fsspec-0.1.2.tar.gz (34 kB)
  Downloading fsspec-0.1.1.tar.gz (34 kB)
  Downloading fsspec-0.1.0.tar.gz (34 kB)
INFO: pip is looking at multiple versions of sphinx-autodoc-typehints to determine which version is compatible with other requirements. This could take a while.
Collecting sphinx-autodoc-typehints
  Downloading sphinx_autodoc_typehints-1.11.1-py3-none-any.whl (8.7 kB)
  Downloading sphinx_autodoc_typehints-1.11.0-py3-none-any.whl (8.5 kB)
  Downloading sphinx_autodoc_typehints-1.10.3-py3-none-any.whl (8.4 kB)
  Downloading sphinx_autodoc_typehints-1.10.2-py3-none-any.whl (8.4 kB)


Command killed due to excessive memory consumption

Maybe we should be pinning dependencies more?

rabernat · 2021-08-25T16:51:38Z

Something about the ci changes has made it take forever to build the python 3.8 environment. I'm going to try unpinning some versions and see if it improves.

https://github.com/pangeo-forge/pangeo-forge-recipes/runs/3424372828?check_suite_focus=true

rabernat

@martindurant - from my point of view this is good to go. Would appreciate your check on the items below.

Also your feedback on the tutorial and docs for the Reference recipe would be great.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

rabernat · 2021-08-26T13:32:57Z

@cisaacstern if you're interested, a review would be helpful here. When you approve, we can merge this PR and add our second Recipe class to the package! 🎉 🎊 Having other Recipe classes is very important because it will help us generalize some of the code patterns. For example, I'm very pleased we can reuse both the FilePattern and the Target stuff here.

cisaacstern

Very cool.

My main question/suggestion is around the name of the new recipe class. Comment about that provided in-line on the recipe module.

.github/workflows/main.yaml

cisaacstern · 2021-08-26T18:07:19Z

tests/test_recipes.py

 from pangeo_forge_recipes.recipes.base import BaseRecipe
+from pangeo_forge_recipes.recipes.xarray_zarr import XarrayZarrRecipe


Given that the ReferenceHDFRecipe test module is called test_reference.py, should this module be renamed to test_xarray_zarr.py?

We can fix this in #167, which also includes test refactors.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py

tests/test_references.py

martindurant · 2021-08-31T17:27:18Z

I don't see anything remaining to be done here.

Add reference HDF recipe type

2cdac17

Martin Durant added 6 commits July 27, 2021 15:37

add test

c48181b

add intake-xarray dep

5f9289e

bad url

589a6bc

Add multi tests

8017745

Merge branch 'master' into references

c64e4fd

fix up

17bea7b

martindurant marked this pull request as ready for review August 2, 2021 17:58

Expose template parameter and add TODOs

40ab233

(NB: TODOs could become issues after merger, but not ones that I intend to work on immediately)

rabernat reviewed Aug 9, 2021

View reviewed changes

martindurant commented Aug 9, 2021

View reviewed changes

Martin Durant added 3 commits August 11, 2021 10:30

Merge branch 'master' into references

be53576

Use MetadataTarget and FSSpecTarget

7f9392d

Actually test temp directories

a036881

rabernat reviewed Aug 12, 2021

View reviewed changes

pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved

rabernat added 2 commits August 23, 2021 12:26

rename work_dir to metadata_cache

2e5499b

rename work_dir to metadata_cache

75fab1d

mend

rabernat added 3 commits August 23, 2021 14:47

refactor conftest; move XarrayZarr fixtures out

20c040f

update reference API

5d43927

update setup.cfg

b1f6b14

automatically set concat dim

3c41fdd

cisaacstern mentioned this pull request Aug 25, 2021

Variable encoding lost in MultiZarrToZarr fsspec/kerchunk#69

Closed

big doc update

b3f49f9

rabernat added 3 commits August 25, 2021 10:34

Merge remote-tracking branch 'upstream/master' into references

555d325

merge docs

513f944

drop python 3.7

cf30def

martindurant commented Aug 25, 2021

View reviewed changes

update requirements

7e64572

rabernat force-pushed the references branch from 34a249a to 7e64572 Compare August 25, 2021 16:17

rabernat added 3 commits August 25, 2021 12:53

unpin some deps in CI envs

dbc4c98

update requirements; try to get rtd to build again

0ac930a

go to python 3.8 for docs

6f80981

rabernat reviewed Aug 25, 2021

View reviewed changes

rabernat mentioned this pull request Aug 26, 2021

Make fsspec_open_kwargs, query_string_secrets, & is_opendap attributes of FilePattern #167

Merged

2 tasks

rabernat requested a review from cisaacstern August 26, 2021 13:30

cisaacstern requested changes Aug 26, 2021

View reviewed changes

rename ReferenceHDFRecipe to HDFReferenceRecipe

84d8bb2

rabernat merged commit 690594a into pangeo-forge:master Aug 31, 2021

martindurant deleted the references branch August 31, 2021 18:13

This was referenced Sep 2, 2021

Tests slowed by > 2x following merge of #167 #199

Closed

Optimize tests following #167 slowdown #198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reference HDF recipe type #174

Add reference HDF recipe type #174

martindurant commented Jul 21, 2021

rabernat commented Jul 27, 2021

martindurant commented Aug 2, 2021

martindurant commented Aug 9, 2021

rabernat left a comment

rabernat Aug 9, 2021

martindurant Aug 9, 2021

rabernat Aug 9, 2021

martindurant Aug 9, 2021

rabernat commented Aug 23, 2021

rabernat commented Aug 24, 2021

martindurant commented Aug 24, 2021

rabernat commented Aug 24, 2021

rabernat commented Aug 24, 2021

martindurant commented Aug 24, 2021

martindurant commented Aug 24, 2021

cisaacstern commented Aug 24, 2021

martindurant commented Aug 24, 2021

review-notebook-app bot commented Aug 25, 2021

rabernat commented Aug 25, 2021

martindurant commented Aug 25, 2021

martindurant Aug 25, 2021

rabernat commented Aug 25, 2021

rabernat commented Aug 25, 2021

rabernat left a comment

rabernat commented Aug 26, 2021

cisaacstern left a comment

cisaacstern Aug 26, 2021

rabernat Aug 26, 2021

rabernat Aug 31, 2021 •

edited

Loading

martindurant commented Aug 31, 2021

	def iter_inputs(self) -> Iterator[InputKey]:
	for input_key in self.file_pattern:
	yield input_key

		from pangeo_forge_recipes.recipes.base import BaseRecipe
		from pangeo_forge_recipes.recipes.xarray_zarr import XarrayZarrRecipe

Add reference HDF recipe type #174

Add reference HDF recipe type #174

Conversation

martindurant commented Jul 21, 2021

rabernat commented Jul 27, 2021

martindurant commented Aug 2, 2021

martindurant commented Aug 9, 2021

rabernat left a comment

Choose a reason for hiding this comment

rabernat Aug 9, 2021

Choose a reason for hiding this comment

martindurant Aug 9, 2021

Choose a reason for hiding this comment

rabernat Aug 9, 2021

Choose a reason for hiding this comment

martindurant Aug 9, 2021

Choose a reason for hiding this comment

rabernat commented Aug 23, 2021

rabernat commented Aug 24, 2021

martindurant commented Aug 24, 2021

rabernat commented Aug 24, 2021

rabernat commented Aug 24, 2021

martindurant commented Aug 24, 2021

martindurant commented Aug 24, 2021

cisaacstern commented Aug 24, 2021

martindurant commented Aug 24, 2021

review-notebook-app bot commented Aug 25, 2021

rabernat commented Aug 25, 2021

martindurant commented Aug 25, 2021

martindurant Aug 25, 2021

Choose a reason for hiding this comment

rabernat commented Aug 25, 2021

rabernat commented Aug 25, 2021

rabernat left a comment

Choose a reason for hiding this comment

rabernat commented Aug 26, 2021

cisaacstern left a comment

Choose a reason for hiding this comment

cisaacstern Aug 26, 2021

Choose a reason for hiding this comment

rabernat Aug 26, 2021

Choose a reason for hiding this comment

rabernat Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

martindurant commented Aug 31, 2021

rabernat Aug 31, 2021 •

edited

Loading