Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming decompression for ZSTD_CONTENTSIZE_UNKNOWN case #707

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mkitti
Copy link
Contributor

@mkitti mkitti commented Feb 13, 2025

Zstandard can use a streaming compression scheme where the total size of the data is not known at the beginning of the process. In this case, the size of the data is unknown
and is not saved in the Zstandard frame header.

Before this pull request, numcodecs would refuse to decompress data if the size were unknown. This pull request adds a routine to decompress data if the size is unknown,
specifically when ZSTD_getFrameContentSize returns ZSTD_CONTENTSIZE_UNKNOWN.

This pull request is based on prior pull request I made to numcodecs.js:
manzt/numcodecs.js#47

Fixes zarr-developers/zarr-python#2056

xref:
zarr-developers/zarr-python#2056

This is currently a draft.

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zarr-python cannot read arrays saved by tensorstore using the zstd compressor
1 participant