Date and Time: Wednesday, 12 December 8:00 a.m.-12:20 p.m.
Location: Independence FGHI, Grand Hyatt, Washington DC
Speaker: Ryan Abernathey (@rabernat), Joe Hamman (@jhamman), and Scott Henderson (@scottyhq)
Abstract: Earth scientists face serious challenges when working with large datasets. Pangeo is a rapidly growing community and software ecosystem for scalable geoscience based on open source scientific python. Pangeo’s three core packages are 1) Jupyter, a web-based tool for interactive computing, 2) xarray, a data-model and toolkit for working with N-dimensional labeled arrays, and 3) Dask, a flexible parallel computing library. When combined with distributed computing, these tools can help geoscientists perform interactive analysis on datasets up to petabytes in size. In this interactive, tutorial we will demonstrate how to employ this platform using real science examples from physical oceanography and hydrology. Participants will follow along using Jupyter notebooks to interact with xarray and Dask running in Google Cloud Platform.
Workshop Agenda 0830-0900: Introduction to Pangeo Project and Software Ecosystem 0900-1030: Hands-on interactive tutorial of xarray 1030-1045: Break 1045-1200: Hands-on interactive tutorial of dask 1200-1230: Tutorial for how to deploy your own Pangeo platform on cloud or HPC computing resources
Learning Objectives: Participants will learn how to:
- Recognize the software packages that comprise the Pangeo platform and explain how they work together
- Load datasets using xarray from netCDF files, openDAP endpoints, and Zarr stores
- Analyze data using xarray's label-based operations and groupby feature
- Work with very large xarray dataset using Dask
Additional Details