Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion dissect/util/compression/lzxpress_huffman.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
length: int
symbol: int

HUFFMAN_BLOCK_SIZE = 65536

def _read_16_bit(fh: BinaryIO) -> int:
return struct.unpack("<H", fh.read(2).rjust(2, b"\x00"))[0]
Expand Down Expand Up @@ -147,11 +148,15 @@
bitstring = BitString()

while src.tell() - start_offset < size:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use a walrus operator here.

if size - (src.tell() - start_offset) <= 256:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a spec you can link to for this?

Copy link
Member Author

@Horofic Horofic Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MS-XCA does not really talk about this case. Some other resources do take it into account and kind of mention it (linked below). Its not part of the Microsoft spec, so this edge-case might be a better fit in CompressedStream, or somewhere in the WOF code. Thoughts?

  1. https://github.com/wbenny/woftool - here something is mentioned about a block being stored uncompressed.
  2. https://wimlib.net/git/?p=wimlib;a=blob;f=src/xpress-compress.c;hb=d66b5c805c4e9a660bac6f979d88c1820cb031f2#l170 - here a value of 261 is mentioned.
  3. https://github.com/jborean93/pyxca/blob/main/src/xca/_xpress_huffman/xpress.c#L2505C37-L2505C64 - here a value of 260 is mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the code or README you link actually are relevant to what you're proposing to do here.

  1. This is just common data compression behaviour, don't compress if the compressed data is larger than the uncompressed - this is almost universal in any application (not algorithm) that utilizes compression.
  2. This is for trying to compress data that is smaller than 261 bytes due to it being technically impossible to produce a smaller output. It returns 1 (error). Likely the application will take this error as a hint to store the data uncompressed (see point 1.).
  3. This indeed checks if there's enough remaining space in the input buffer, but it only returns successfully if there's already the expected amount of data decompressed. If not, it also throws an error.

Are you actually fixing an algorithm bug or are you trying to work around an application bug (in the wrong place)?

dst.extend(src.read(size - (src.tell() - start_offset)))
return bytes(dst)

Check warning on line 153 in dissect/util/compression/lzxpress_huffman.py

View check run for this annotation

Codecov / codecov/patch

dissect/util/compression/lzxpress_huffman.py#L152-L153

Added lines #L152 - L153 were not covered by tests

root = _build_tree(src.read(256))
bitstring.init(src)

chunk_size = 0
while chunk_size < 65536 and src.tell() - start_offset < size:
while chunk_size < HUFFMAN_BLOCK_SIZE and src.tell() - start_offset < size:
symbol = bitstring.decode(root)
if symbol < 256:
dst.append(symbol)
Expand Down
Loading