-
Notifications
You must be signed in to change notification settings - Fork 7
Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #75 +/- ##
==========================================
- Coverage 86.33% 86.22% -0.12%
==========================================
Files 20 20
Lines 1303 1307 +4
==========================================
+ Hits 1125 1127 +2
- Misses 178 180 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| bitstring = BitString() | ||
|
|
||
| while src.tell() - start_offset < size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use a walrus operator here.
| bitstring = BitString() | ||
|
|
||
| while src.tell() - start_offset < size: | ||
| if size - (src.tell() - start_offset) <= 256: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a spec you can link to for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MS-XCA does not really talk about this case. Some other resources do take it into account and kind of mention it (linked below). Its not part of the Microsoft spec, so this edge-case might be a better fit in CompressedStream, or somewhere in the WOF code. Thoughts?
- https://github.com/wbenny/woftool - here something is mentioned about a block being stored uncompressed.
- https://wimlib.net/git/?p=wimlib;a=blob;f=src/xpress-compress.c;hb=d66b5c805c4e9a660bac6f979d88c1820cb031f2#l170 - here a value of 261 is mentioned.
- https://github.com/jborean93/pyxca/blob/main/src/xca/_xpress_huffman/xpress.c#L2505C37-L2505C64 - here a value of 260 is mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the code or README you link actually are relevant to what you're proposing to do here.
- This is just common data compression behaviour, don't compress if the compressed data is larger than the uncompressed - this is almost universal in any application (not algorithm) that utilizes compression.
- This is for trying to compress data that is smaller than 261 bytes due to it being technically impossible to produce a smaller output. It returns
1(error). Likely the application will take this error as a hint to store the data uncompressed (see point 1.). - This indeed checks if there's enough remaining space in the input buffer, but it only returns successfully if there's already the expected amount of data decompressed. If not, it also throws an error.
Are you actually fixing an algorithm bug or are you trying to work around an application bug (in the wrong place)?
|
Superseded by #76, fox-it/dissect.archive#16, and fox-it/dissect.ntfs#42 |
This PR addresses an edge-case found in LZXPRESS+huffman compressed files when uncompressed (original) data is appended to the end of the stream. This is commonly encountered in WOF compressed files.
This is in preparation for contributing the POC code mentioned in fox-it/dissect.ntfs#41.