Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression #5

Open
countvajhula opened this issue Jul 6, 2023 · 0 comments
Open

Compression #5

countvajhula opened this issue Jul 6, 2023 · 0 comments

Comments

@countvajhula
Copy link
Collaborator

Error correcting codes such as parity and Reed-Solomon add redundancy to the data in order to be able to recover from errors. Yet, the data itself may contain a lot of unstructured redundancy that results in inefficient use of space. We can improve this by compressing the data prior to encoding it for error recovery. This removes excess redundancy in the original data so that the data we encode is minimally redundant and takes up less space when stored. Once the data is recovered, it would then need to be decompressed to yield the original file.

File - compress -> F' - encode -> F'' - decode -> F' -decompress-> F

It's necessary to compress before encoding for error recovery since we expect errors or erasures to occur in the actual stored data. If we compressed after encoding, then the encoding for error recovery would be rendered useless since we wouldn't even be able to decompress the file with there being pieces missing. We could compress individual shards (including jewels) and that would be fine, but it's unlikely we'd want to do this since the rate of compression is likely to be higher over the entire file rather than over a piece of it.

Ideally, compression should be easily composable with other storage schemes, so implementing it as a mixin (like the other storage schemes) could be a good approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant