generated from ProjectPythia/cookbook-template
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
242 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,8 @@ dependencies: | |
- ujson | ||
- xarray | ||
- zarr | ||
- hvplot | ||
- datashader | ||
|
||
- pip: | ||
- sphinx-pythia-theme | ||
|
239 changes: 239 additions & 0 deletions
239
notebooks/case_studies/Streaming_Visualizations_with_Hvplot_Datashader.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f6c39c84", | ||
"metadata": {}, | ||
"source": [ | ||
"# Kerchunk, hvPlot, and Datashader: Visualizing datasets on-the-fly" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9eced552", | ||
"metadata": {}, | ||
"source": [ | ||
"## Overview\n", | ||
" \n", | ||
"This notebook will demonstrate how to use Kerchunk with hvPlot and Datashader to lazily visualize a reference dataset in a streaming fashion.\n", | ||
"\n", | ||
"We will be building off content from [Kerchunk and Pangeo-Forge](../case_studies/NetCDF_Pangeo_Forge_gridMET.ipynb), so it's encouraged you first go through that.\n", | ||
"\n", | ||
"## Prerequisites\n", | ||
"| Concepts | Importance | Notes |\n", | ||
"| --- | --- | --- |\n", | ||
"| [Kerchunk Basics](../foundations/kerchunk_basics) | Required | Core |\n", | ||
"| [Multiple Files and Kerchunk](../foundations/kerchunk_multi_file) | Required | Core |\n", | ||
"| [Introduction to Xarray](https://foundations.projectpythia.org/core/xarray/xarray-intro.html) | Required | IO |\n", | ||
"| [Introduction to hvPlot](https://hvplot.holoviz.org/) | Required | Data Visualization |\n", | ||
"| [Introduction to Datashader](https://datashader.org/index.html) | Required | Big Data Visualization |\n", | ||
"- **Time to learn**: 10 minutes\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4043365b", | ||
"metadata": {}, | ||
"source": [ | ||
"## Motivation\n", | ||
"\n", | ||
"Using Kerchunk, we don't have to create a copy of the data--instead we create a collection of reference files, so that the original data files can be read as if they were Zarr.\n", | ||
"\n", | ||
"This enables visualization on-the-fly; simply pass in the URL to the dataset and use hvplot." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8e2c4765", | ||
"metadata": {}, | ||
"source": [ | ||
"## Getting to Know The Data\n", | ||
"\n", | ||
"`gridMET` is a high-resolution daily meteorological dataset covering CONUS from 1979-2023. It is produced by the Climatology Lab at UC Merced. In this example, we are going to look create a virtual Zarr dataset of a derived variable, Burn Index. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f8d0f8f1", | ||
"metadata": {}, | ||
"source": [ | ||
"## Imports" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "706c1b95", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import time\n", | ||
"\n", | ||
"import apache_beam as beam\n", | ||
"import fsspec\n", | ||
"import hvplot.xarray\n", | ||
"import xarray as xr\n", | ||
"from pangeo_forge_recipes.patterns import ConcatDim, FilePattern\n", | ||
"from pangeo_forge_recipes.transforms import (\n", | ||
" CombineReferences,\n", | ||
" OpenWithKerchunk,\n", | ||
" WriteCombinedReference,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "858399ce", | ||
"metadata": {}, | ||
"source": [ | ||
"## Preprocess Dataset\n", | ||
"\n", | ||
"Here we will be preparing the Kerchunk reference files by using the recipe described in [Kerchunk and Pangeo-Forge](../case_studies/NetCDF_Pangeo_Forge_gridMET.ipynb).\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "06fb2af3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Constants\n", | ||
"target_root = \"references\"\n", | ||
"store_name = \"Pangeo_Forge\"\n", | ||
"full_path = os.path.join(target_root, store_name, \"reference.json\")\n", | ||
"years = list(range(1979, 1980))\n", | ||
"time_dim = ConcatDim(\"time\", keys=years)\n", | ||
"\n", | ||
"\n", | ||
"# Functions\n", | ||
"def format_function(time):\n", | ||
" return f\"http://www.northwestknowledge.net/metdata/data/bi_{time}.nc\"\n", | ||
"\n", | ||
"# Patterns\n", | ||
"pattern = FilePattern(format_function, time_dim, file_type=\"netcdf4\")\n", | ||
"pattern = pattern.prune()\n", | ||
"\n", | ||
"# Apache Beam transforms\n", | ||
"transforms = (\n", | ||
" beam.Create(pattern.items())\n", | ||
" | OpenWithKerchunk(file_type=pattern.file_type)\n", | ||
" | CombineReferences(\n", | ||
" concat_dims=[\"day\"],\n", | ||
" identical_dims=[\"lat\", \"lon\", \"crs\"],\n", | ||
" )\n", | ||
" | WriteCombinedReference(target_root=target_root, store_name=store_name)\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7818440e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Opening the Kerchunk Dataset\n", | ||
"\n", | ||
"Now, it's a matter of opening the Kerchunk dataset and calling `hvplot` with the `rasterize=True` keyword argument.\n", | ||
"\n", | ||
"If you're running this notebook locally, try zooming around the map by hovering over the plot and scrolling; it should update fairly quickly. Note, it will **not** update if you're viewing this on the docs page online as there is no backend server, but don't fret because there's a demo GIF below!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ce0f1766", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%timeit -r 1 -n 1\n", | ||
"\n", | ||
"mapper = fsspec.get_mapper(\n", | ||
" \"reference://\",\n", | ||
" fo=full_path,\n", | ||
" remote_protocol=\"http\",\n", | ||
")\n", | ||
"ds_kerchunk = xr.open_dataset(\n", | ||
" mapper, engine=\"zarr\", decode_coords=\"all\", backend_kwargs={\"consolidated\": False}\n", | ||
")\n", | ||
"display(ds_kerchunk.hvplot(\"lon\", \"lat\", rasterize=True))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e53531df", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"../images/kerchunk.gif\" width=400 alt=\"Kerchunk Zoom\"></img>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f2c56945", | ||
"metadata": {}, | ||
"source": [ | ||
"## Comparing Against THREDDS\n", | ||
"\n", | ||
"Now, we will be repeating the previous cell, but with THREDDS.\n", | ||
"\n", | ||
"Note how the initial load is longer.\n", | ||
"\n", | ||
"If you're running the notebook locally (or a demo GIF below), zooming in/out also takes longer to finish buffering as well." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "368ac51d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%timeit -r 1 -n 1\n", | ||
"\n", | ||
"def url_gen(year):\n", | ||
" return (\n", | ||
" f\"http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/bi/bi_{year}.nc\"\n", | ||
" )\n", | ||
"\n", | ||
"urls_list = [url_gen(year) for year in years]\n", | ||
"netcdf_ds = xr.open_mfdataset(urls_list, engine=\"netcdf4\")\n", | ||
"display(netcdf_ds.hvplot(\"lon\", \"lat\", rasterize=True))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6c0dd0cd", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"../images/thredds.gif\" width=400 alt=\"THREDDS Zoom\"></img>" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.11" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "b8afa8ad8f3d27e858f1dbdc03ccd45fac432e2a03d4a98c501e197170438b83" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.