Compression #5

countvajhula · 2023-07-06T18:56:15Z

Error correcting codes such as parity and Reed-Solomon add redundancy to the data in order to be able to recover from errors. Yet, the data itself may contain a lot of unstructured redundancy that results in inefficient use of space. We can improve this by compressing the data prior to encoding it for error recovery. This removes excess redundancy in the original data so that the data we encode is minimally redundant and takes up less space when stored. Once the data is recovered, it would then need to be decompressed to yield the original file.

File - compress -> F' - encode -> F'' - decode -> F' -decompress-> F

It's necessary to compress before encoding for error recovery since we expect errors or erasures to occur in the actual stored data. If we compressed after encoding, then the encoding for error recovery would be rendered useless since we wouldn't even be able to decompress the file with there being pieces missing. We could compress individual shards (including jewels) and that would be fine, but it's unlikely we'd want to do this since the rate of compression is likely to be higher over the entire file rather than over a piece of it.

Ideally, compression should be easily composable with other storage schemes, so implementing it as a mixin (like the other storage schemes) could be a good approach.

countvajhula mentioned this issue Jul 6, 2023

Client-keyed Encryption #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression #5

Compression #5

countvajhula commented Jul 6, 2023

Compression #5

Compression #5

Comments

countvajhula commented Jul 6, 2023