Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75

Horofic · 2025-03-24T19:37:55Z

This PR addresses an edge-case found in LZXPRESS+huffman compressed files when uncompressed (original) data is appended to the end of the stream. This is commonly encountered in WOF compressed files.

This is in preparation for contributing the POC code mentioned in fox-it/dissect.ntfs#41.

codecov · 2025-03-24T19:39:02Z

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.22%. Comparing base (628f2a4) to head (ebfd6da).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
dissect/util/compression/lzxpress_huffman.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
- Coverage   86.33%   86.22%   -0.12%     
==========================================
  Files          20       20              
  Lines        1303     1307       +4     
==========================================
+ Hits         1125     1127       +2     
- Misses        178      180       +2

Flag	Coverage Δ
unittests	`86.22% <60.00%> (-0.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Schamper · 2025-03-25T00:30:03Z

dissect/util/compression/lzxpress_huffman.py


    bitstring = BitString()

    while src.tell() - start_offset < size:


Can use a walrus operator here.

Schamper · 2025-03-25T00:30:29Z

dissect/util/compression/lzxpress_huffman.py

    bitstring = BitString()

    while src.tell() - start_offset < size:
+        if size - (src.tell() - start_offset) <= 256:


Is there a spec you can link to for this?

MS-XCA does not really talk about this case. Some other resources do take it into account and kind of mention it (linked below). Its not part of the Microsoft spec, so this edge-case might be a better fit in CompressedStream, or somewhere in the WOF code. Thoughts?

https://github.com/wbenny/woftool - here something is mentioned about a block being stored uncompressed.

https://wimlib.net/git/?p=wimlib;a=blob;f=src/xpress-compress.c;hb=d66b5c805c4e9a660bac6f979d88c1820cb031f2#l170 - here a value of 261 is mentioned.

https://github.com/jborean93/pyxca/blob/main/src/xca/_xpress_huffman/xpress.c#L2505C37-L2505C64 - here a value of 260 is mentioned.

None of the code or README you link actually are relevant to what you're proposing to do here.

This is just common data compression behaviour, don't compress if the compressed data is larger than the uncompressed - this is almost universal in any application (not algorithm) that utilizes compression.

This is for trying to compress data that is smaller than 261 bytes due to it being technically impossible to produce a smaller output. It returns 1 (error). Likely the application will take this error as a hint to store the data uncompressed (see point 1.).

This indeed checks if there's enough remaining space in the input buffer, but it only returns successfully if there's already the expected amount of data decompressed. If not, it also throws an error.

Are you actually fixing an algorithm bug or are you trying to work around an application bug (in the wrong place)?

Horofic · 2025-04-17T16:00:10Z

Superseded by #76, fox-it/dissect.archive#16, and fox-it/dissect.ntfs#42

Add case for uncompressed data at end of buffer

ebfd6da

Horofic requested review from Schamper and Copilot and removed request for Copilot March 24, 2025 19:38

Schamper requested changes Mar 25, 2025

View reviewed changes

Horofic requested a review from Schamper March 27, 2025 14:48

Horofic closed this Apr 17, 2025

Schamper deleted the wof branch April 17, 2025 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75

Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75

Uh oh!

Horofic commented Mar 24, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 24, 2025 •

edited

Loading

Uh oh!

Schamper Mar 25, 2025

Uh oh!

Schamper Mar 25, 2025

Uh oh!

Horofic Mar 26, 2025 •

edited

Loading

Uh oh!

Schamper Apr 1, 2025

Uh oh!

Horofic commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		bitstring = BitString()

		while src.tell() - start_offset < size:

Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75

Address edge-case for uncompressed data at end of LZXPRESS+huffman stream #75

Uh oh!

Conversation

Horofic commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Schamper Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Schamper Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Horofic Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Schamper Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Horofic commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Horofic commented Mar 24, 2025 •

edited

Loading

codecov bot commented Mar 24, 2025 •

edited

Loading

Horofic Mar 26, 2025 •

edited

Loading