diff --git a/docs/demos/24-07-11-scipy-2024/scipy-2024.ipynb b/docs/demos/24-07-11-scipy-2024/scipy-2024.ipynb index 8c180b4a..e6b7090b 100644 --- a/docs/demos/24-07-11-scipy-2024/scipy-2024.ipynb +++ b/docs/demos/24-07-11-scipy-2024/scipy-2024.ipynb @@ -9,11 +9,11 @@ }, "source": [ "
\n", - " \"xCDAT\n", + " \"xCDAT\n", "
\n", "\n", "
\n", - " \"SciPy\n", + " \"SciPy\n", "
\n", "\n", "# SciPy 2024 - xCDAT (Xarray Climate Data Analysis Tools)\n", @@ -85,7 +85,7 @@ "### This presentation is available on xCDAT's Read The Docs Page\n", "\n", "
\n", - " \"Presentation\n", + " \"Presentation\n", "
\n", "\n", "#### https://xcdat.readthedocs.io/en/latest/demos/24-07-11-scipy-2024/scipy-2024.html\n" @@ -113,8 +113,8 @@ "source": [ "### A little about me\n", "\n", - "- Software Engineer at **Lawrence Livermore National Laboratory (LLNL)**\n", - "- **Energy Exascale Earth System Model (E3SM)** and **Simplifying ESM Analysis Through Standards (SEATS)** projects\n", + "- Software Engineer at Lawrence Livermore National Laboratory (**LLNL**)\n", + "- Energy Exascale Earth System Model (**E3SM**) and Simplifying ESM Analysis Through Standards (**SEATS**)\n", "- Lead developer of xCDAT\n", "- Contributor and DevOps engineer for various E3SM tools\n", "\n", @@ -155,12 +155,14 @@ "source": [ "## An Overview of this Talk\n", "\n", - "1. The driving force behind xCDAT\n", - "2. Scope and mission of xCDAT\n", - "3. Design philosophy and key features of xCDAT\n", - "4. Technical demo: end-to-end analysis workflow\n", - "5. Parallelism with xCDAT and Dask\n", - "6. xCDAT’s community and how to get involved\n" + "__Objective: Learn about the grounds-up development of an open-source Python package targeted at a specific scientific domain__\n", + "\n", + "* Driving force behind xCDAT\n", + "* Scope and mission of xCDAT\n", + "* Design philosophy and key features\n", + "* Technical demo of an end-to-end analysis workflow\n", + "* Parallelism with Dask\n", + "* How to get involved\n" ] }, { @@ -173,29 +175,59 @@ "source": [ "## The Driving Force Behind xCDAT\n", "\n", - "- CDAT (Community Data Analysis Tools) library is the predecessor to xCDAT.\n", - "- CDAT has provided open-source climate data analysis and visualization packages for over 20 years.\n", - "- Since CDAT’s inception, the volume of climate data has grown substantially as a result of:\n", - " - Larger pool of data products\n", - " - Increasing spatiotemporal resolution of model and observational data.\n", + "- Analysis of climate data frequently requires a number of core operations. For example: \n", + " - Reading and writing netCDF files\n", + " - Regridding\n", + " - Spatial and temporal averaging \n", + "- Highly performant operations to handle the growing volume of climate data \n", + " - Larger pool of data products \n", + " - Increasing spatiotemporal resolution of model and observational data\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### CDAT, the predecessor to xCDAT\n", "\n", "
\n", - "\"CDAT\n", - "
\n" + "\"CDAT\n", + "\n", + "\n", + "- CDAT (Community Data Analysis Tools) library provided open-source climate data analysis and visualization packages for over 20 years\n", + "\n", + "\n", + "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "fragment" } }, "source": [ "### The present-day challenge: **CDAT is end-of-life** as of December 2023\n", "\n", "- A big issue for users and packages that depend on CDAT\n", - "- All of these factors sparked a driving need for new analysis software\n" + "- A driving need for new analysis software\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + " xCDAT addresses this need by **combining the power of Xarray** with **geospatial analysis features inspired by CDAT**." ] }, { @@ -206,17 +238,17 @@ } }, "source": [ - "### What should this new analysis software (aka xCDAT) offer?\n", + "### What are the general goals of xCDAT?\n", "\n", - "- **Offer similar core capabilities** as CDAT\n", + "- **Offer similar core capabilities to CDAT**\n", " - e.g., geospatial averaging, temporal averaging, regridding\n", "- **Use modern technologies** in the library’s stack\n", " - Capable of handling large datasets (e.g., parallelism, lazy operations)\n", "- **Maintainable, extensible, and easy-to-use**\n", " - Python Enhancement Proposals (PEPs)\n", - " - Software sustainability\n", " - Reproducible science\n", "- **Foster open-source community**\n", + " - Software sustainability\n", " - Serve the needs of the climate community in the long-term\n", " - Community engagement efforts (e.g., Pangeo, ESGF)\n" ] @@ -231,16 +263,32 @@ "source": [ "
\n", "\"Xarray\n", - "

\"N-D labeled arrays and datasets in Python\"

\n", "
\n", + "

\"N-D labeled arrays and datasets in Python\"

\n", + "\n", "\n", "**Why is Xarray the core technology of xCDAT?**\n", "\n", - "- Mature, stable, widely adopted\n", + "- Mature widely adopted \n", + "- Fiscal funding from NumFocus\n", "- Introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays\n", "- Intuitive, more concise, and less error-prone user experience\n", "\n", - "**Key features include:**\n", + "
\n", + " \"NumFocus\n", + "
\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "#### Key features of Xarray\n", "\n", "- File I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)\n", "- Supports various file formats netCDF, Iris, OPeNDAP, Zarr, and more\n", @@ -269,7 +317,7 @@ "source": [ "
\n", "\"xCDAT\n", - "

Xarray Climate Data Analysis Tools for Structured Grid Analysis

\n", + "

Xarray Climate Data Analysis Tools

\n", "
\n", "\n", "- Collaboration between:\n", @@ -282,7 +330,7 @@ "\n", "- xCDAT is an extension of Xarray for climate data analysis on structured grids\n", "- A modern successor to the Community Data Analysis Tools (CDAT) library\n", - "- The core development team of software engineers and climate scientists are users of the software\n" + "- Team composed of software engineers and climate scientists who are also users of the software\n" ] }, { @@ -293,8 +341,10 @@ } }, "source": [ - "- **Scope** is focused on routine climate research analysis operations such as loading, wrangling, averaging, and regridding data\n", - "- **Goal** of providing features and utilities for simple and robust analysis of climate data\n", + "#### Scope of xCDAT\n", + "\n", + "- Focused on routine climate research analysis operations such as loading, wrangling, averaging, and regridding data\n", + "- Provide features and utilities for simple and robust analysis of climate data\n", "- Leverages other powerful Xarray-based packages such as xESMF, xgcm, and cf-xarray\n", "\n", "
\n", @@ -314,7 +364,7 @@ }, "source": [ "
\n", - "

xCDAT features and utilities for simple, robust, and less error-prone analysis code

\n", + "

xCDAT features for simple, robust, and less error-prone analysis code

\n", "
\n", "\n", "- Extension of `xr.open_dataset()` and `xr.open_mfdataset()` with post-processing options\n", @@ -322,8 +372,8 @@ "- Generate missing bounds, center time coords, convert lon axis orientation\n", "- Geospatial weighted averaging\n", "- Temporal averaging, climatologies, departures\n", - "- Horizontal structured regridding (extension of xESMF and Python port of regrid2)\n", - "- Vertical structured regridding (extension of xgcm)\n", + "- Horizontal regridding (extension of `xesmf` and Python port of `regrid2`)\n", + "- Vertical regridding (extension of `xgcm`)\n", "\n", "
\n", " \"Spatial\n", @@ -571,12 +621,10 @@ "\n", "#### Accessors classes include:\n", "\n", - "- `spatial`\n", - " - `.average()`, `.get_weights()`\n", - "- `temporal`\n", - " - `.average()`, `.group_average()`, `.climatology()`, `.depatures()`\n", - "- `bounds`\n", - " - `.get_bounds()`, `.add_bounds()`, `.add_missing_bounds()`\n" + "- `spatial` -- `.average()`, `.get_weights()`\n", + "- `temporal` -- `.average()`, `.group_average()`, `.climatology()`, `.depatures()`\n", + "- `regridding` -- `horizontal()`, `vertical()`\n", + "- `bounds` -- `.get_bounds()`, `.add_bounds()`, `.add_missing_bounds()`\n" ] }, { @@ -612,8 +660,7 @@ "\n", "### Overview\n", "\n", - "This exercise will walkthrough using `xcdat` to perform computation and analysis\n", - "on E3SM v2 CMIP6 data.\n", + "Use `xcdat` to perform computation and analysis on CMIP6 data from the E3SM v2 model.\n", "\n", "### Sections\n", "\n", @@ -622,8 +669,7 @@ "3. Horizontal Regridding\n", "4. Vertical Regridding\n", "5. Spatial Averaging\n", - "6. Temporal Computations\n", - "7. General Dataset Utilities\n" + "6. Temporal Computations" ] }, { @@ -731,23 +777,19 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "-" + "slide_type": "notes" } }, "source": [ - "We will be analyzing E3SM v2 CMIP6 monthly near-sea surface air temperature (`tas`) data from 2000 to 2014.\n" + "- Use `xc.open_dataset()` to a single netCDF dataset as an `xr.Dataset` object.\n", + "- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.open_dataset.html\n" ] }, { "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "notes" - } - }, + "metadata": {}, "source": [ - "- Use `xc.open_dataset()` to a single netCDF dataset as an `xr.Dataset` object.\n", - "- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.open_dataset.html\n" + "Analyzing monthly `tas` (near-sea surface air temperature) data from 2000 to 2014." ] }, { @@ -4781,7 +4823,10 @@ "source": [ "#### Calculate the near-surface air temperature (`tas`) in the Niño 3.4 region.\n", "\n", - "Users can also specify their own bounds for a region.\n" + "Users can also specify their own bounds for a region. In this case, we specified `keep_weights=True`.\n", + "\n", + "- The weights provide full spatial weighting for grid cells entirely within the Niño 3.4 region.\n", + "- Partial weights for grid cells partially in the region." ] }, { @@ -4799,25 +4844,6 @@ ").compute()" ] }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "-" - } - }, - "source": [ - "In this case, we specified `keep_weights=True`.\n", - "\n", - "- The weights provide full spatial weighting for grid cells entirely within the Niño 3.4 region.\n", - "- Partial weights for grid cells partially in the region.\n", - "\n", - "Other notes:\n", - "\n", - "- We use the 4 x 4 degree grid in this example to show the partial weights and to speed up plotting.\n", - "- You can also supply your own weights, but you can't automatically subset with `lat_bounds` and `lon_bounds` if you supply your own weights\n" - ] - }, { "cell_type": "markdown", "metadata": { @@ -5018,7 +5044,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "subslide" } }, "source": [ @@ -5051,7 +5077,7 @@ "source": [ "ds_global_anomaly = ds_global.temporal.departures(\n", " \"tas\", freq=\"month\", reference_period=(\"2000-01-01\", \"2009-12-31\")\n", - ")" + ") " ] }, { @@ -5116,7 +5142,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "source": [ @@ -5131,7 +5157,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "-" + "slide_type": "skip" } }, "source": [ @@ -5145,7 +5171,7 @@ "execution_count": 25, "metadata": { "slideshow": { - "slide_type": "-" + "slide_type": "skip" } }, "outputs": [], @@ -5157,7 +5183,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "source": [ @@ -5169,7 +5195,7 @@ "execution_count": 26, "metadata": { "slideshow": { - "slide_type": "-" + "slide_type": "skip" } }, "outputs": [ @@ -5215,7 +5241,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "source": [ @@ -5235,7 +5261,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "source": [ @@ -5262,7 +5288,7 @@ "metadata": { "cell_style": "split", "slideshow": { - "slide_type": "-" + "slide_type": "skip" } }, "outputs": [ @@ -5284,7 +5310,7 @@ "metadata": { "cell_style": "split", "slideshow": { - "slide_type": "-" + "slide_type": "skip" } }, "outputs": [ @@ -5306,7 +5332,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "source": [ @@ -5322,7 +5348,7 @@ "cell_type": "markdown", "metadata": { "slideshow": { - "slide_type": "notes" + "slide_type": "skip" } }, "source": [ @@ -5334,7 +5360,7 @@ "execution_count": 30, "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "skip" } }, "outputs": [ @@ -5361,7 +5387,7 @@ "execution_count": 31, "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" } }, "outputs": [ @@ -6265,8 +6291,7 @@ "\n", "- Parallel computing with Dask (xCDAT): https://xcdat.readthedocs.io/en/latest/examples/parallel-computing-with-dask.html\n", "- Parallel computing with Dask (Xarray): https://docs.xarray.dev/en/stable/user-guide/dask.html\n", - "- Xarray with Dask Arrays: https://examples.dask.org/xarray.html\n", - "\n" + "- Xarray with Dask Arrays: https://examples.dask.org/xarray.html\n" ] }, { @@ -6323,11 +6348,11 @@ "source": [ "
\n", " \"xCDAT\n", - "

Recap of Key Points

\n", + "

Key takeaways

\n", "
\n", "\n", - "- xCDAT is an **extension of Xarray for climate data analysis on structured grids**, a modern successor to the Community Data Analysis Tools (CDAT) library\n", - "- **Focused on routine climate research analysis operations** including loading, wrangling, averaging, such as temporal averaging, spatial averaging, and regridding\n", + "- xCDAT is an **extension of Xarray for climate data analysis on structured grids**\n", + "- Focused on routine **climate research analysis operations** \n", "- Designed to encourages **software sustainability and reproducible science**\n", "- **Parallelizable** through Xarray’s support for Dask, which enables efficient processing of large datasets\n" ]