Skip to content

Conversation

@Horofic
Copy link
Member

@Horofic Horofic commented Mar 24, 2025

This PR addresses an edge-case found in LZXPRESS+huffman compressed files when uncompressed (original) data is appended to the end of the stream. This is commonly encountered in WOF compressed files.

This is in preparation for contributing the POC code mentioned in fox-it/dissect.ntfs#41.

@Horofic Horofic requested review from Schamper and Copilot and removed request for Copilot March 24, 2025 19:38
@codecov
Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.22%. Comparing base (628f2a4) to head (ebfd6da).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dissect/util/compression/lzxpress_huffman.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
- Coverage   86.33%   86.22%   -0.12%     
==========================================
  Files          20       20              
  Lines        1303     1307       +4     
==========================================
+ Hits         1125     1127       +2     
- Misses        178      180       +2     
Flag Coverage Δ
unittests 86.22% <60.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


bitstring = BitString()

while src.tell() - start_offset < size:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use a walrus operator here.

bitstring = BitString()

while src.tell() - start_offset < size:
if size - (src.tell() - start_offset) <= 256:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a spec you can link to for this?

Copy link
Member Author

@Horofic Horofic Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MS-XCA does not really talk about this case. Some other resources do take it into account and kind of mention it (linked below). Its not part of the Microsoft spec, so this edge-case might be a better fit in CompressedStream, or somewhere in the WOF code. Thoughts?

  1. https://github.com/wbenny/woftool - here something is mentioned about a block being stored uncompressed.
  2. https://wimlib.net/git/?p=wimlib;a=blob;f=src/xpress-compress.c;hb=d66b5c805c4e9a660bac6f979d88c1820cb031f2#l170 - here a value of 261 is mentioned.
  3. https://github.com/jborean93/pyxca/blob/main/src/xca/_xpress_huffman/xpress.c#L2505C37-L2505C64 - here a value of 260 is mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the code or README you link actually are relevant to what you're proposing to do here.

  1. This is just common data compression behaviour, don't compress if the compressed data is larger than the uncompressed - this is almost universal in any application (not algorithm) that utilizes compression.
  2. This is for trying to compress data that is smaller than 261 bytes due to it being technically impossible to produce a smaller output. It returns 1 (error). Likely the application will take this error as a hint to store the data uncompressed (see point 1.).
  3. This indeed checks if there's enough remaining space in the input buffer, but it only returns successfully if there's already the expected amount of data decompressed. If not, it also throws an error.

Are you actually fixing an algorithm bug or are you trying to work around an application bug (in the wrong place)?

@Horofic Horofic requested a review from Schamper March 27, 2025 14:48
@Horofic
Copy link
Member Author

Horofic commented Apr 17, 2025

Superseded by #76, fox-it/dissect.archive#16, and fox-it/dissect.ntfs#42

@Horofic Horofic closed this Apr 17, 2025
@Schamper Schamper deleted the wof branch April 17, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants