You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h1>Virtual Zarr Cookbook (Kerchunk and VirtualiZarr)<aclass="headerlink" href="#virtual-zarr-cookbook-kerchunk-and-virtualizarr" title="Link to this heading"><iclass="fas fa-link"></i></a></h1>
<p>This Project Pythia Cookbook covers using the <aclass="reference external" href="https://fsspec.github.io/kerchunk/">Kerchunk</a>
441
-
library to access archival data formats as if they were
442
-
ARCO (Analysis-Ready-Cloud-Optimized) data.</p>
440
+
<p>This Project Pythia Cookbook covers using the <aclass="reference external" href="https://fsspec.github.io/kerchunk/">Kerchunk</a>, <aclass="reference external" href="https://virtualizarr.readthedocs.io/en/latest/index.html">VirtualiZarr</a>, and <aclass="reference external" href="https://zarr.readthedocs.io/en/stable/">Zarr-Python</a> libraries to access archival data formats as if they were ARCO (Analysis-Ready-Cloud-Optimized) data.</p>
443
441
<sectionid="motivation">
444
442
<h2>Motivation<aclass="headerlink" href="#motivation" title="Link to this heading"><iclass="fas fa-link"></i></a></h2>
445
-
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> library allows you to access chunked and compressed
443
+
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> library pioneered the access of chunked and compressed
446
444
data formats (such as NetCDF3. HDF5, GRIB2, TIFF & FITS), many of
447
445
which are the primary data formats for many data archives, as if
448
446
they were in ARCO formats such as Zarr which allows for parallel,
449
447
chunk-specific access. Instead of creating a new copy of the dataset
450
448
in the Zarr spec/format, <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> reads through the data archive
451
449
and extracts the byte range and compression information of each
452
-
chunk, then writes that information to a .json file (or alternate
453
-
backends in future releases). For more details on how this process
These summary files can then be combined to generated a <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code>
457
-
reference for that dataset, which can be read via
458
-
<aclass="reference external" href="https://zarr.readthedocs.io">Zarr</a> and
450
+
chunk, then writes that information to a “virtual Zarr store” using a
451
+
JSON or Parquet “reference file”. The <codeclass="docutils literal notranslate"><spanclass="pre">VirtualiZarr</span></code>
452
+
library provides a simple way to create these “virtual stores” using familiary
453
+
<codeclass="docutils literal notranslate"><spanclass="pre">xarray</span></code> syntax. Lastly, the <codeclass="docutils literal notranslate"><spanclass="pre">icechunk</span></code> provides a new way to store and re-use these references.</p>
454
+
<p>These virtual Zarr stores can be re-used and read via <aclass="reference external" href="https://zarr.readthedocs.io">Zarr</a> and
<h2>Authors<aclass="headerlink" href="#authors" title="Link to this heading"><iclass="fas fa-link"></i></a></h2>
@@ -477,23 +475,23 @@ <h2>Structure<a class="headerlink" href="#structure" title="Link to this heading
477
475
<p>This cookbook is broken up into two sections,
478
476
Foundations and Example Notebooks.</p>
479
477
<sectionid="section-1-foundations">
480
-
<h3>Section 1 Foundations<aclass="headerlink" href="#section-1-foundations" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
478
+
<h3>Section 1 - Foundations<aclass="headerlink" href="#section-1-foundations" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
481
479
<p>In the <codeclass="docutils literal notranslate"><spanclass="pre">Foundations</span></code> section we will demonstrate
482
-
how to use <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> to create reference sets
480
+
how to use <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code>and <codeclass="docutils literal notranslate"><spanclass="pre">VirtualiZarr</span></code>to create reference files
483
481
from single file sources, as well as to create
484
-
multi-file virtual datasets from collections of files.</p>
482
+
multi-file virtual Zarr stores from collections of files.</p>
<h3>Section 2 Generating Reference Files<aclass="headerlink" href="#section-2-generating-reference-files" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
488
-
<p>The notebooks in the <codeclass="docutils literal notranslate"><spanclass="pre">Generating</span><spanclass="pre">Reference</span><spanclass="pre">Files</span></code> section
489
-
demonstrate how to use <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> to create
<h3>Section 2 - Generating Virtual Zarr Stores<aclass="headerlink" href="#section-2-generating-virtual-zarr-stores" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
486
+
<p>The notebooks in the <codeclass="docutils literal notranslate"><spanclass="pre">Generating</span><spanclass="pre">Virtual</span><spanclass="pre">Zarr</span><spanclass="pre">Stores</span></code> section
487
+
demonstrates how to use <codeclass="docutils literal notranslate"><spanclass="pre">Kerchunk</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">VirtualiZarr</span></code> to create
<h3>Section 3 Using Pre-Generated References<aclass="headerlink" href="#section-3-using-pre-generated-references" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
496
-
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">Pre-Generated</span><spanclass="pre">References</span></code> section contains notebooks demonstrating how to load existing references into <codeclass="docutils literal notranslate"><spanclass="pre">Xarray</span></code>and <codeclass="docutils literal notranslate"><spanclass="pre">Xarray-Datatree</span></code>, generated coordinates for GeoTiffs using <codeclass="docutils literal notranslate"><spanclass="pre">xrefcoord</span></code> and plotting using <codeclass="docutils literal notranslate"><spanclass="pre">Hvplot</span><spanclass="pre">Datashader</span></code>.</p>
492
+
<sectionid="section-3-using-virtual-zarr-stores">
493
+
<h3>Section 3 - Using Virtual Zarr Stores<aclass="headerlink" href="#section-3-using-virtual-zarr-stores" title="Link to this heading"><iclass="fas fa-link"></i></a></h3>
494
+
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">Using</span><spanclass="pre">Virtual</span><spanclass="pre">Zarr</span><spanclass="pre">Stores</span></code>section contains notebooks demonstrating how to load existing references into <codeclass="docutils literal notranslate"><spanclass="pre">Xarray</span></code>, generating coordinates for GeoTiffs using <codeclass="docutils literal notranslate"><spanclass="pre">xrefcoord</span></code>, and plotting using <codeclass="docutils literal notranslate"><spanclass="pre">Hvplot</span><spanclass="pre">Datashader</span></code>.</p>
497
495
</section>
498
496
</section>
499
497
<sectionid="running-the-notebooks">
@@ -623,9 +621,9 @@ <h3>Running on Your Own Machine<a class="headerlink" href="#running-on-your-own-
This Project Pythia Cookbook covers using the [Kerchunk](https://fsspec.github.io/kerchunk/)
10
-
library to access archival data formats as if they were
11
-
ARCO (Analysis-Ready-Cloud-Optimized) data.
9
+
This Project Pythia Cookbook covers using the [Kerchunk](https://fsspec.github.io/kerchunk/), [VirtualiZarr](https://virtualizarr.readthedocs.io/en/latest/index.html), and [Zarr-Python](https://zarr.readthedocs.io/en/stable/) libraries to access archival data formats as if they were ARCO (Analysis-Ready-Cloud-Optimized) data.
12
10
13
11
## Motivation
14
12
15
-
The `Kerchunk` library allows you to access chunked and compressed
13
+
The `Kerchunk` library pioneered the access of chunked and compressed
16
14
data formats (such as NetCDF3. HDF5, GRIB2, TIFF & FITS), many of
17
15
which are the primary data formats for many data archives, as if
18
16
they were in ARCO formats such as Zarr which allows for parallel,
19
17
chunk-specific access. Instead of creating a new copy of the dataset
20
18
in the Zarr spec/format, `Kerchunk` reads through the data archive
21
19
and extracts the byte range and compression information of each
22
-
chunk, then writes that information to a .json file (or alternate
23
-
backends in future releases). For more details on how this process
@@ -48,24 +48,24 @@ the creator of `Kerchunk` and the
48
48
This cookbook is broken up into two sections,
49
49
Foundations and Example Notebooks.
50
50
51
-
### Section 1 Foundations
51
+
### Section 1 - Foundations
52
52
53
53
In the `Foundations` section we will demonstrate
54
-
how to use `Kerchunk` to create reference sets
54
+
how to use `Kerchunk`and `VirtualiZarr`to create reference files
55
55
from single file sources, as well as to create
56
-
multi-file virtual datasets from collections of files.
56
+
multi-file virtual Zarr stores from collections of files.
57
57
58
-
### Section 2 Generating Reference Files
58
+
### Section 2 - Generating Virtual Zarr Stores
59
59
60
-
The notebooks in the `Generating Reference Files` section
61
-
demonstrate how to use `Kerchunk` to create
60
+
The notebooks in the `Generating Virtual Zarr Stores` section
61
+
demonstrates how to use `Kerchunk` and `VirtualiZarr` to create
62
62
datasets for all the supported file formats.
63
-
`Kerchunk`currently supports NetCDF3,
64
-
NetCDF4/HDF5, GRIB2, TIFF (including CoG).
63
+
These libraries currently support virtualizing NetCDF3,
64
+
NetCDF4/HDF5, GRIB2, TIFF (including COG).
65
65
66
-
### Section 3 Using Pre-Generated References
66
+
### Section 3 - Using Virtual Zarr Stores
67
67
68
-
The `Pre-Generated References` section contains notebooks demonstrating how to load existing references into `Xarray` and `Xarray-Datatree`, generated coordinates for GeoTiffs using `xrefcoord` and plotting using `Hvplot Datashader`.
68
+
The `Using Virtual Zarr Stores` section contains notebooks demonstrating how to load existing references into `Xarray`, generating coordinates for GeoTiffs using `xrefcoord`, and plotting using `Hvplot Datashader`.
0 commit comments