Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a few rof notebooks #126

Open
wants to merge 82 commits into
base: main
Choose a base branch
from
Open

Conversation

nmizukami
Copy link
Member

@nmizukami nmizukami commented Aug 23, 2024

Initial commits for ROF notebooks.

An ultimate set of the notebooks intend to mimic old ROF diagnostic plots

This PR is just starting with a few notebooks.

All Submissions:

  • Have you followed the guidelines in our Contributor's Guide (including the pre-commit check)?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally prior to submission?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully tested your changes locally?

@TeaganKing
Copy link
Collaborator

Hey @nmizukami ! Thanks so much for adding these notebooks! I made those changes we discussed to cupid-run and the config file to include rof in running cupid and in the jupyter book table of contents.

One note: I think we'll need to provide math in cupid-analysis if you prefer to use that for sqrt rather than numpy in metrics.py? I don't think this should be a problem, but just let me know if you'd like me to do that?

Would you be able to pull these changes in locally, test running your notebook with cupid-run and make sure that things look as you expect?

@TeaganKing TeaganKing self-requested a review August 23, 2024 20:11
@TeaganKing TeaganKing added lnd enhancement New feature or request labels Aug 23, 2024
@nmizukami
Copy link
Member Author

Hi Teagan (@TeaganKing), The notebook almost ran with cupid-run -rof. One error was reading geopackage file (gis vector data) via geopanda.

You can see /glade/work/mizukami/CUPiD/examples/coupled_model/month_annual_flow.ipynb.

cupid prints on screen this:

RROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.

I have seen this before. I don't fully understand the error, but this is coming from pyogrio package that came with geopandas. pyogrio is used internally in geopandas.

When I ran the notebook outside cupid, it runs fine but I activated Python [conda-env:cupid-analysis] environment in the jupyterhub. I see another one called cupid-analysis, which I believe cupid actually uses. I saw similar error when I use cupid-analysis. wondering what is the difference between Python [conda-env:cupid-analysis] and cupid-analysis?

hopefully I am simply setting something e.g., environment incorrectly...

@TeaganKing
Copy link
Collaborator

Hi @nmizukami , Sorry I let this slip! In the environment in which this was working, did you have a particular version of geopandas or pyogrio pinned? I could also add that to the environment yaml specification. Or when you previously ran into this error, did you have another solution?

This error may be because PROJ is already installed-- I'm not sure where at this point, but can look into that.

@nmizukami
Copy link
Member Author

nmizukami commented Aug 27, 2024

Hi @TeaganKing, some hint is that I can ran outside cupid-run, meaning I can run the notebook manually on jupyterhub with [conda-env:cupid-analysis] env on, but NOT with cupid-analysis on (get similar error on PROJ). You see two similar envs in Jupyter in image below. I believe the package versions should be ok. I can think about this more... I don't know what is the difference between [conda-env:cupid-analysis] and cupid-analysis

Screen Shot 2024-08-27 at 12 07 05 PM

@TeaganKing
Copy link
Collaborator

TeaganKing commented Aug 27, 2024

It sounds like there may be some issue related to the ipykernel installation. I think one of these might be the installation from ipykernel (a soft linked conda environment) and the other may be a conda environment found elsewhere (possibly an outdated cupid-analysis that doesn't include geopandas?). Mike mentioned that the ipykernel installation basically creates a softlink to an environment, which made me think that could be an inconsistency.

I had updated a test environment but not my actual cupid-analysis environment; I'm doing that now and will test your notebook out. This is probably not the most efficient workflow, but I wonder if it might also be worth removing your cupid-analysis environment, see if it's still listed as an option in JupyterHub, make sure that both versions are removed, and then re-install a clean version?

@nmizukami
Copy link
Member Author

I did the following steps to remove cupid-analysis env and reinstall it on terminal.

mamba remove --name cupid-analysis --all
mamba env create -f environments/cupid-analysis.yml

It did not fix it. After removing cupid-analysis, jupyterhub still showed cupid-analysis, though [conda-env:cupid-analysis] was gone.

@nmizukami
Copy link
Member Author

Hi @TeaganKing , trying to run conda list to see what packages are there in cupid-analysis env when running cupid-run. So unfortunately including conda list in the notebook cause error in cupid run:

SyntaxError: An error happened when checking the source code. 
:25:7: invalid syntax

conda list

@nmizukami
Copy link
Member Author

casper-login1:/glade/work/mizukami/CUPiD/examples/coupled_model (main_adding_rof)> cupid-run -rof

/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py:455: UserWarning: 
========================================================================================= DAG render with warnings =========================================================================================
----------------------------------------------------------------- NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb') ------------------------------------------------------------------
----------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/infrastructure/index.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
============================================================================================ Summary (2 tasks) =============================================================================================
NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb')
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
========================================================================================= DAG render with warnings =========================================================================================

  warnings.warn(str(warnings_))
Executing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.03cell/s]
Building task 'month_annual_flow':  50%|███████████████████████████████████████████████████████████████████                                                                   | 1/2 [00:02<00:02,  2.92s/itERROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.
                                                                                                                                                                                                           /glade/u/apps/opt/conda/condabin/conda                                                                                                                                      | 5/69 [00:20<03:39,  3.44s/cell]
Executing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍               | 62/69 [03:53<00:26,  3.76s/cell]
Building task 'month_annual_flow': 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:56<00:00, 118.06s/it]
Traceback (most recent call last):
  File "/glade/work/mizukami/conda-envs/cupid-dev/bin/cupid-run", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/CUPiD/cupid/run.py", line 290, in run
    dag.build()
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 557, in build
    report = callable_()
             ^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 662, in _build
    raise build_exception
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 591, in _build
    task_reports = self._executor(dag=self, show_progress=show_progress)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/executors/serial.py", line 203, in __call__
    raise DAGBuildError(str(exceptions_all))
ploomber.exceptions.DAGBuildError: 
============================================================================================= DAG build failed =============================================================================================
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [24]":
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 2
      1 column_stat = []
----> 2 gauge_shp_all_case = gauge_shp.copy(deep=True)
      3 for case, grid_name in cases.items():
      4     gauge_shp_all_case = gauge_shp_all_case.merge(
      5         gauge_shp1[case][["id", f"{error_metric}_{grid_name}"]],
      6         left_on="id",
      7         right_on="id",
      8     )

NameError: name 'gauge_shp' is not defined

ploomber.exceptions.TaskBuildError: Error when executing task 'month_annual_flow'. Partially executed notebook available at /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/rof/month_annual_flow.ipynb
ploomber.exceptions.TaskBuildError: Error building task "month_annual_flow"
============================================================================================= Summary (1 task) =============================================================================================
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
============================================================================================= DAG build failed =============================================================================================

@nmizukami
Copy link
Member Author

nmizukami commented Aug 29, 2024

Hi @TeaganKing,
Small good new is that I got it run without the geopanda error. The trick is to add this
os.environ['PROJ_LIB']='/glade/work/mizukami/conda-envs/cupid-analysis/share/proj'
before loading geopandas.
However, I don't think this is permanent solution. I still try to consult with CISL.

I was able to create /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/_build/html/index.html
How do you usually open under HPC. I was trying to open firefox in derecho/casper, but it is very slow. Wonder if there is any other ways to look.

@TeaganKing
Copy link
Collaborator

Hi @nmizukami , I'm glad that is temporarily working (but of course we need this to work for any user's environment). Yes, I think this would be a good conversation to have with CISL.

Regarding looking at output, see the second section on this page for recommendations on NCAR machines.

@TeaganKing TeaganKing mentioned this pull request Sep 10, 2024
6 tasks
@TeaganKing
Copy link
Collaborator

TeaganKing commented Sep 10, 2024

Hey @nmizukami , I added a PR to bring rof into run.py. And then I realized these changes are already in this PR... so apologies-- feel free to ignore that!

@TeaganKing
Copy link
Collaborator

TeaganKing commented Sep 10, 2024

To-do:

  • update readme to include 'rof' on line 104: -rof, --river-runoff Run river runoff component diagnostics

@nmizukami
Copy link
Member Author

Updated key_metrics/config.yml and coupled_model/config.yml for rof
modify two notebooks based on config changes so now they run.

Review is needed and some science questions came up (e.g., what to do if you plot for time period when no observation is available. Are the other notebooks comparing the model outputs with observations??)

@TeaganKing
Copy link
Collaborator

Hey @nmizukami , thanks for these updates.

Not all notebooks are comparing with observations, but you can see an example of an observational comparison in the glacier notebook & corresponding config.yml details.

I think that if you are plotting for a time period where observations are not available, perhaps a warning statement that the obs are unavailable would be useful?

@TeaganKing
Copy link
Collaborator

And I'll review after our discussion on Thursday.

@nmizukami
Copy link
Member Author

Right now I am pointing to case /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing/b.e23_alpha16b.BLT1850.ne30_t232.054

The time period for this case is year 0001-0102, when for sure there is no observation for any components. So I thought this config is meant to compare the simulation with some base simulation, to see the model comparison or something like that, not meant to validate the model component with observations.

Just wanted to understand the context of this setup. just with current config, the rof notebooks look less interesting, but technically the notebook works now (I believe).

If config point to any CESM cases that use 20th-21st century, rof notebook automatically adds the observed streamflow to the plots, and compare the simulations with observations.

@TeaganKing
Copy link
Collaborator

The key setup here that's different from the coupled-model example is in the 'global params' section of the config file, where we have both a case name for the case you're looking at, as well as the base_case_name for a comparison case. The observations are defined separately in each individual notebook config section at this point.

@TeaganKing
Copy link
Collaborator

That sounds good that plots are generated without obs if obs do not exist.

@TeaganKing
Copy link
Collaborator

TeaganKing commented Oct 17, 2024

@nmizukami is planning to do the following:

  • implement comparison with base_case in addition to obs
  • remove years that overwrite config file start/end years
  • include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
  • test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

@nmizukami
Copy link
Member Author

nmizukami commented Oct 18, 2024

@nmizukami is planning to do the following:

  • implement comparison with base_case in addition to obs
  • remove years that overwrite config file start/end years
  • include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
  • test cupid-run from key_metrics directory

Once these items are done, @TeaganKing can review.

Hi @TeaganKing, all are done!

month_annual_flow:
parameter_groups:
none:
analysis_name: 'mosart_test'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to note that mosart_test is here too. Could we perhaps use the case name and type and/or years of analysis being run?

## Configurations used for ctsm-mizuRoute

# Directories
geospatial_dir: /glade/campaign/cgd/tss/people/mizukami/ctsm-mizuRoute/geospatial
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about including personal directories that may not be accessible to everyone

@TeaganKing
Copy link
Collaborator

TeaganKing commented Oct 23, 2024

Hey @nmizukami , thanks for all your work on this! It's definitely getting much closer!

I made a few in-line comments above and also wanted to remind you of @mnlevy1981 's comment from above: "I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code". This same structure is also currently used in ocean_discharge.ipynb. You can see details on the current serial vs parallel implementation in our documentation.

Also, this is a minor comment, but would you be able to rename your notebooks according to our new suggested naming convention: <region>_<variable>_<metric>_<comparisons>.ipynb (eg, something similar to Global_PSL_NMSE_compare_obs_lens.ipynb or Greenland_SMB_visual_compare_obs.ipynb)

@TeaganKing
Copy link
Collaborator

It looks like the notebooks ran in key_metrics in about 15 minutes. Both notebooks seem to be running smoothly. One additional comment regarding the figures-- in month_annual_flow.ipynb, there are a number of figures that are saved but not directly plotted in the notebook. For the purposes of generating the juptyerbook, I think it'd be best to show the plots directly in the notebook.

In the coupled_model example, in Cell 5, there was an error with no files to open. It may be best to leave the key_metrics example, and take out rof from the coupled_model example since we'll need to update that configuration file anyways and create an 'additional metrics' example.

@nmizukami
Copy link
Member Author

nmizukami commented Oct 27, 2024

Hey @nmizukami , thanks for all your work on this! It's definitely getting much closer!

I made a few in-line comments above and also wanted to remind you of @mnlevy1981 's comment from above: "I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code". This same structure is also currently used in ocean_discharge.ipynb. You can see details on the current serial vs parallel implementation in our documentation.

I believe I have upadted both notebooks so they use LocaCluster now like this. Are you looking at the old ones??

client = None
if serial:
    cluster = LocalCluster(**lc_kwargs)
    client = Client(cluster)

client

Also, this is a minor comment, but would you be able to rename your notebooks according to our new suggested naming convention: <region>_<variable>_<metric>_<comparisons>.ipynb (eg, something similar to Global_PSL_NMSE_compare_obs_lens.ipynb or Greenland_SMB_visual_compare_obs.ipynb)

How about these?

month_annual_flow.ipynb -> global_discharge_gauge_comparison.ipynb
ocean_discharge.ipynb -> global_discharge_ocean_comparison.ipynb

For comparison part, it plots observations, but it is not available (because of analysis year), it is not plotted.

what does mean (is it different from ?)

@nmizukami
Copy link
Member Author

nmizukami commented Oct 27, 2024

It looks like the notebooks ran in key_metrics in about 15 minutes. Both notebooks seem to be running smoothly. One additional comment regarding the figures-- in month_annual_flow.ipynb, there are a number of figures that are saved but not directly plotted in the notebook. For the purposes of generating the juptyerbook, I think it'd be best to show the plots directly in the notebook.

15min-> did you run all the components? It seems to take too long. I put a 10 year period (rof_start_data and rof_end_date) in config.yml now, so should be read faster and reading netcdfs is a bottleneck now.

name               Ran?      Elapsed (s)    Percentage
-----------------  ------  -------------  ------------
index              True          9.39778       10.1186
month_annual_flow  True         55.4196        59.6702
ocean_discharge    True         28.0591        30.2112

Regarding no inline figures from cells - actually those cells are not running without observations available. I was not sure what to do with this situation. If there are observations, they should run and produce the figures.

In the coupled_model example, in Cell 5, there was an error with no files to open. It may be best to leave the key_metrics example, and take out rof from the coupled_model example since we'll need to update that configuration file anyways and create an 'additional metrics' example.

@TeaganKing
Copy link
Collaborator

Thanks for clarifying-- the way you are using LocalCluster works!

For the notebook names, would these work? Or if you prefer, maybe you could just take out _compare_obs if observations are only available sometimes?

month_annual_flow.ipynb -> global_discharge_gauge_compare_obs.ipynb
ocean_discharge.ipynb -> global_discharge_ocean_compare_obs.ipynb

RE plotting in line-- I think that's fine if the plots show up when observations are available. Since the default is save_figs=False, I think this is reasonable.

Did you want to take out the coupled_model config file changes, per our discussion on updating the config file to a similar format as the key_metrics config file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lnd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants