Add MVP LZ4 component #661

whyitfor · 2025-10-23T02:33:11Z

I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
I have made or updated a changelog entry for the changes in this pull request.

One sentence summary of this PR (This should go in the CHANGELOG!)
Add LZ4 compression format unpackers and packers with support for all frame types (modern, legacy, skippable)

Link to Related Issue(s)
N/A.

Please describe the changes in your request.
Lz4 Components.

Lz4Unpacker currently supports unpacking modern LZ4 format (Lz4ModernData),
legacy format (see Lz4LegacyData), and skippable data (Lz4SkippableData).

Lz4Packer supports repacking the modern LZ4 format (Lz4ModernData), matching block/checksum
information extracted during unpacking. Compression level can be specified via config.

Lz4LegacyPacker supports repacking legacy LZ4 format (Lz4LegacyData) with compression level
support (default/fast/high modes). Compression level can be specified via config.

Anyone you think should look at this, specifically?
@rbs-jacob

ofrak_core/src/ofrak/core/lz4.py

ofrak_core/tests/components/test_lz4_component.py

ofrak_core/CHANGELOG.md

ofrak_core/requirements.txt

Co-authored-by: Jacob Strieb <99368685+rbs-jacob@users.noreply.github.com>

ofrak_core/CHANGELOG.md

rbs-afflitto · 2025-11-04T19:55:47Z

I'm attempting to unpack an LZ4 (legacy) compressed kernel with this unpacker, but I'm getting a runtime error:

Invalid LZ4 legacy format: header says 4623489 bytes but found 17143553

Using the standard lz4 tool (v1.9.4) the image unpacks. I wonder if this is a bug in the Python lz4 module.

17143561 seems to be the file size of the lz4 file.

rbs-afflitto · 2025-11-05T18:17:29Z

I'm attempting to unpack an LZ4 (legacy) compressed kernel with this unpacker, but I'm getting a runtime error:
Invalid LZ4 legacy format: header says 4623489 bytes but found 17143553
Using the standard lz4 tool (v1.9.4) the image unpacks. I wonder if this is a bug in the Python lz4 module.

17143561 seems to be the file size of the lz4 file.

I pushed a fix for this. The LZ4 legacy format uses a block size of 8MB, but the original packer/unpacker did not have a block size. I also updated the tests and added new assets greater than 8MB to test this failure.

rbs-jacob

Most things fairly minor, and several are completely optional.

rbs-jacob · 2025-11-07T22:24:31Z

ofrak_core/tests/components/test_lz4_component.py

+    # Read the original content
+    initial_data = test_case.input_file.read_bytes()
+
+    modification = b"OFRAK"
+
+    # Create resource and unpack
+    resource = await ofrak_context.create_root_resource_from_file(test_case.test_file)
+    await resource.unpack()
+
+    # Verify it has the expected tag
+    assert resource.has_tag(Lz4Data)
+
+    # Get the child and verify initial content
+    child = await resource.get_only_child()
+    child_data = await child.get_data()
+    assert child_data == initial_data
+
+    # Modify the data
+    child.queue_patch(Range.from_size(0, len(modification)), modification)
+    await child.save()
+
+    # Pack it back
+    await resource.pack()
+
+    # Verify the repacked data by unpacking it again
+    repacked_data = await resource.get_data()
+    verify_resource = await ofrak_context.create_root_resource(
+        "repacked_test.lz4", data=repacked_data
+    )
+    await verify_resource.unpack()
+
+    verify_child = await verify_resource.get_only_child()
+    verified_data = await verify_child.get_data()
+    assert verified_data.startswith(modification)


Suggested change

# Read the original content

initial_data = test_case.input_file.read_bytes()

modification = b"OFRAK"

# Create resource and unpack

resource = await ofrak_context.create_root_resource_from_file(test_case.test_file)

await resource.unpack()

# Verify it has the expected tag

assert resource.has_tag(Lz4Data)

# Get the child and verify initial content

child = await resource.get_only_child()

child_data = await child.get_data()

assert child_data == initial_data

# Modify the data

child.queue_patch(Range.from_size(0, len(modification)), modification)

await child.save()

# Pack it back

await resource.pack()

# Verify the repacked data by unpacking it again

repacked_data = await resource.get_data()

verify_resource = await ofrak_context.create_root_resource(

"repacked_test.lz4", data=repacked_data

)

await verify_resource.unpack()

verify_child = await verify_resource.get_only_child()

verified_data = await verify_child.get_data()

assert verified_data.startswith(modification)

resource = await ofrak_context.create_root_resource_from_file(test_case.test_file)

await resource.unpack()

assert resource.has_tag(Lz4Data)

initial_data = test_case.input_file.read_bytes()

child = await resource.get_only_child()

child_data = await child.get_data()

assert child_data == initial_data

modification = b"OFRAK"

child.queue_patch(Range.from_size(0, len(modification)), modification)

await child.save()

await resource.pack()

repacked_data = await resource.get_data()

verify_resource = await ofrak_context.create_root_resource(

"repacked_test.lz4", data=repacked_data

)

await verify_resource.unpack()

verify_child = await verify_resource.get_only_child()

verified_data = await verify_child.get_data()

assert verified_data.startswith(modification)

In my opinion, these comments just add noise. This code is pretty self-explanatory, and doesn't need them. Explaining that resource.pack means "pack it back" is actively harmful to code readability in my opinion.

I only bring this up to get ahead of it, since I imagine there will be increasing LLM assistance when making OFRAK contributions. We will want to agree on whether comments like these are appropriate or not in order to more effectively use LLMs, and ensure they don't delay code review.

Not sure if there is a Claude setting to get it to chill out with the comments? I've heard they help it "think through" writing code, and do a better job. If that's the case, maybe an additional step at the end to go through and automatically clean up/remove many of them, specified in a Claude rules file somewhere?

There is a direct suggestion in this comment as an example of how much more concise the code could be, with added benefit of logical grouping for the expressions. But the general comment applies to most of these test functions.

I'd love to find a way to solve this and/or add it to the contributor guidelines so I don't have to point it out in nitpicky comments like this one!

rbs-jacob · 2025-11-07T22:41:45Z

ofrak_core/tests/components/test_lz4_component.py

+Test the functionality of the LZ4 component, including unpacking,
+modifying, and repacking LZ4-compressed data.


I don't have issues with the tests in this file, but I'm curious why you didn't use the unpack modify pack test pattern (or other test patterns)?

rbs-jacob · 2025-11-07T22:55:46Z

ofrak_core/src/ofrak/core/lz4.py

+                # LZ4 legacy block size (uncompressed) is 8 MB (see https://github.com/lz4/lz4/blob/67a385a170d2dc331a25677e0d20d96eef0450c5/programs/lz4io.c#L86)
+                decompressed_data += lz4.block.decompress(
+                    compressed_block,
+                    uncompressed_size=8 * (1 << 20),


Suggested change

uncompressed_size=8 * (1 << 20),

uncompressed_size=8 * 1024 * 1024,

Sometimes I find it clearer to express MB in terms of KB squared. Feel free to ignore this – just a matter of taste.

For context I chose to use the same syntax that the lz4 repo used here. I agree it's less intuitive, but wanted to ensure it matches the reference implementation.

rbs-jacob · 2025-11-07T22:57:23Z

ofrak_core/src/ofrak/core/lz4.py

+    Compression level can be specified via config (default: 0).
+    """
+
+    targets = (Lz4ModernData,)


Is it intentional that none of these target and pack the skippable data?

Yes. There doesn't seem to be a good use case for now to actually pack skippable data -- once we need this we can add it in.

rbs-jacob · 2025-11-07T22:58:36Z

ofrak_core/src/ofrak/core/lz4.py

+        child_data = await lz4_child.get_data()
+
+        # LZ4 legacy format uses 8 MB blocks (see https://github.com/lz4/lz4/blob/67a385a170d2dc331a25677e0d20d96eef0450c5/programs/lz4io.c#L86)
+        LEGACY_BLOCK_SIZE = 8 * (1 << 20)  # 8 MB


Suggested change

LEGACY_BLOCK_SIZE = 8 * (1 << 20) # 8 MB

LEGACY_BLOCK_SIZE = 8 * 1024 * 1024 # 8 MB

Same comment as above about using KB squared. Once again, feel free to ignore.

rbs-jacob · 2025-11-07T22:59:41Z

ofrak_core/src/ofrak/core/lz4.py

+
+            # Append block size + compressed block data
+            compressed_block_size = len(compressed_block)
+            lz4_compressed += compressed_block_size.to_bytes(4, "little") + compressed_block


Have you tried running this for moderately large files? Does it run slowly? I've had speedups from appending byte strings to a list, and then concatenating the list in one go. Curious if that would apply here.

I agree. @rbs-afflitto I'll fix this for you.

Add MVP LZ4 component

ba46b60

whyitfor commented Oct 23, 2025

View reviewed changes

ofrak_core/src/ofrak/core/lz4.py Outdated Show resolved Hide resolved

whyitfor commented Oct 23, 2025

View reviewed changes

ofrak_core/src/ofrak/core/lz4.py Outdated Show resolved Hide resolved

Update lz4 components

70fed98

rbs-jacob requested changes Oct 23, 2025

View reviewed changes

whyitfor and others added 2 commits October 26, 2025 19:22

Apply suggestions from code review

c371c7d

Co-authored-by: Jacob Strieb <99368685+rbs-jacob@users.noreply.github.com>

Add more complete LZ4 testing

120ba32

whyitfor force-pushed the feature/lz4 branch from 0422103 to 120ba32 Compare October 29, 2025 22:33

whyitfor added 2 commits October 30, 2025 10:35

Ensure tests run!

ae260e7

Small updates

5867dfc

whyitfor commented Oct 30, 2025

View reviewed changes

ofrak_core/CHANGELOG.md Outdated Show resolved Hide resolved

Update ofrak_core/CHANGELOG.md

67c418c

whyitfor requested a review from rbs-jacob October 30, 2025 16:33

Fix legacy unpacking and update test cases

c8fa55c

Block dependence fixes

edb9331

rbs-jacob marked this pull request as ready for review November 6, 2025 17:26

Merge branch 'master' into feature/lz4

7570aed

rbs-jacob requested changes Nov 7, 2025

View reviewed changes

		Test the functionality of the LZ4 component, including unpacking,
		modifying, and repacking LZ4-compressed data.

	uncompressed_size=8 * (1 << 20),
	uncompressed_size=8 * 1024 * 1024,

	LEGACY_BLOCK_SIZE = 8 * (1 << 20) # 8 MB
	LEGACY_BLOCK_SIZE = 8 * 1024 * 1024 # 8 MB

Add MVP LZ4 component #661

Are you sure you want to change the base?

Add MVP LZ4 component #661

Uh oh!

Conversation

whyitfor commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rbs-afflitto commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rbs-afflitto commented Nov 5, 2025

Uh oh!

rbs-jacob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

whyitfor commented Oct 23, 2025 •

edited

Loading

rbs-afflitto commented Nov 4, 2025 •

edited

Loading