DataMine 2023 Project: Managing Massive Weather Data in the Cloud

Project Summary

Hydrologic modeling both requires the use of high-resolution retrospective micro-meteorological data and produces high-resolution outputs comprising 10s of variables for the area of interest. For long-term simulations (5 years or more) of the National Water Model’s WRF-Hydro, these datasets can quickly run into the terabytes. These raw datasets are in the NetCDF format which presents challenges for extracting specific variables and subsetting to the region and period of interest. This project will evaluate various cloud-optimized data formats such as Zarr or Parquet, benchmarking various chunking/subsetting operations, and identifying the best format to support these operations on a variety of retrospective datasets. The transformed (cloud-optimized) data would then be stored in OSN (Open Storage Network) for public access via popular Python packages such as XArray or Dask.

Semester 1 milestone

Create a preliminary data model that transforms AORC forcing data into a cloud-optimized format i.e. faster than storing the raw data offsite. Create documentation for this preliminary model that outlines how we transformed the data, and how to store and retrieve the data from an offsite database.

Team lead: Michael Adelemoni, madelemo@purdue.edu

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Point Data		Point Data
Script		Script
docs		docs
.DS_Store		.DS_Store
.gitignore		.gitignore
Benchmark.ipynb		Benchmark.ipynb
LICENSE		LICENSE
README.md		README.md
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataMine 2023 Project: Managing Massive Weather Data in the Cloud

Project Summary

Semester 1 milestone

About

Releases

Packages

Contributors 3

Languages

License

I-GUIDE/dataminecloudoptimized

Folders and files

Latest commit

History

Repository files navigation

DataMine 2023 Project: Managing Massive Weather Data in the Cloud

Project Summary

Semester 1 milestone

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages