Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmip6 tutorial rewrite #125

Merged
merged 5 commits into from
May 10, 2021

Conversation

cisaacstern
Copy link
Member

@cisaacstern cisaacstern commented May 6, 2021

A draft PR, which will close #115 when complete.

@rabernat, this rewrite is currently failing on what is now cell 21; rendered static notebook here:

https://github.com/cisaacstern/pangeo-forge-recipes/blob/cmip6-tutorial-rewrite/docs/tutorials/cmip6-recipe.ipynb

It seems that the __post_init__ of XarrayZarrRecipe is expecting my ConcatDim object to be in the file pattern's .concat_dims attribute. As demonstrated by cell 22 of the rendered static notebook, however, this object is actually ending up in an attribute of the file pattern named .combine_dims.

Is it possible that this is a bug of the refactoring? Or have I overlooked something on the user side?

The traceback is in the static notebook linked above, but pasting as plain text below, for reference:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-21-b7662d2a343b> in <module>
      1 from pangeo_forge_recipes.recipes.xarray_zarr import XarrayZarrRecipe
      2 
----> 3 recipe = XarrayZarrRecipe(
      4     pattern,
      5     target_chunks = target_chunks,

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/pangeo_forge_recipes-0.0.0-py3.8.egg/pangeo_forge_recipes/recipes/xarray_zarr.py in __init__(self, file_pattern, inputs_per_chunk, target_chunks, target, input_cache, metadata_cache, cache_inputs, copy_input_to_local_file, consolidate_zarr, xarray_open_kwargs, xarray_concat_kwargs, delete_input_encoding, fsspec_open_kwargs, process_input, process_chunk, _concat_dim, _concat_dim_chunks)

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/pangeo_forge_recipes-0.0.0-py3.8.egg/pangeo_forge_recipes/recipes/xarray_zarr.py in __post_init__(self)
    178         self._validate_file_pattern()
    179         # from here on we know there is at most one merge dim and one concat dim
--> 180         self._concat_dim = self.file_pattern.concat_dims[0]
    181         self._cache_metadata = any(
    182             [v is None for v in self.file_pattern.concat_sequence_lens.values()]

IndexError: list index out of range

@cisaacstern
Copy link
Member Author

cisaacstern commented May 6, 2021

A-ha, now realizing that this @property of FilePattern should be returning a non-empty list, which would resolve my question from above:

@property
def concat_dims(self) -> List[str]:
"""List of dims that are concat operations"""
return [op.name for op in self.combine_dims if isinstance(op, ConcatDim)]

I am not seeing the expected behavior at the moment, however. (Getting an empty list.) So I'll look into that now and get back with any further questions on this thread if/as they arise.

Also, noting for posterity that my statement above:

It seems that the __post_init__ of XarrayZarrRecipe is expecting my ConcatDim object to be in the file pattern's .concat_dims attribute.

Should be corrected to: "the __post_init__ of XarrayZarrRecipe is expecting the name of my ConcatDimobject to be in the file pattern's .concat_dims attribute."

@cisaacstern
Copy link
Member Author

Ryan found the root cause of the issue described above. One of the imports in the notebook still read:

from pangeo_forge.patterns import FilePattern

Now that the package name has changed, this needed to be updated to:

from pangeo_forge_recipes.patterns import FilePattern

Making this change resolved the file pattern issue and allowed me to instantiate the recipe without raising any errors.

As an aside, I am somewhat baffled as to why IPython allowed the incorrect import to run without error, as I do not appear to have any modules named pangeo_forge installed in the development environment. In searching out why this line was able to execute, I did come across one location where this name remained unchanged, despite the renaming efforts of #122: namely, in setup.cfg. I changed this name in the most recent commit to this PR, though for the time being it seems that has not resolved this "lingering name" issue. Perhaps this points to the need to revisit the questions around formalizing dev environments raised in #80.

Working now on a NoCredentialsError raised by Step 6: Execute the recipe.

@cisaacstern
Copy link
Member Author

I believe this is now ready to merge and close #115.

I resolved the credentials issue by specifying fsspec_open_kwargs = {'anon':True} when instantiating the recipe. Here is a minimal example I wrote help me understand what was going on: https://gist.github.com/cisaacstern/10026f6c922453513b69aca15d79a186

Note that cell 6 raises a NoCredentialsError when fsspec_open_kwargs = {'anon':False}, which is also the default behavior I was encountering when leaving fsspec_open_kwargs unspecified. I'm curious what has changed (in fsspec? in s3? somewhere else?) since the last time this notebook was run, as this kwarg does not seem to have been needed at that time.

@cisaacstern cisaacstern marked this pull request as ready for review May 7, 2021 18:41
@cisaacstern cisaacstern requested a review from rabernat May 7, 2021 18:42
@martindurant
Copy link
Contributor

{'anon':False} has been default for s3fs since forever

@cisaacstern
Copy link
Member Author

Thanks for weighing in, @martindurant. That makes sense. Perhaps the last time this notebook was run with credentials set as environment variables, or at some other layer not captured in my development environment.

@rabernat
Copy link
Contributor

rabernat commented May 7, 2021

FYI I just authorized reviewnb for this org, which makes it a lot easier to review notebook diffs.

Your PR is here: https://app.reviewnb.com/pangeo-forge/pangeo-forge-recipes/pull/125/

In the future it will start posting automatically on PR with notebook changes.

@@ -62,25 +62,25 @@
"output_type": "stream",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to link to another page is not via a direct url but a myst markdown directive. I believe this should be {doc}file_patterns.


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 Actually could not figure out how to do this, so I just removed all of the docs urls, in case you want to merge over the weekend. We can revisit links Monday if desired (I should learn how to do this eventually), but actually I think this notebook reads fine without them.

@@ -62,25 +62,25 @@
"output_type": "stream",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on the link syntax.


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above re: my myst directive confusion

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rabernat rabernat merged commit 86107f8 into pangeo-forge:master May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue on page /tutorials/cmip6-recipe.html
3 participants