Initial reimplementation of composefs-c#225
Draft
cgwalters wants to merge 19 commits intocomposefs:mainfrom
Draft
Initial reimplementation of composefs-c#225cgwalters wants to merge 19 commits intocomposefs:mainfrom
cgwalters wants to merge 19 commits intocomposefs:mainfrom
Conversation
Collaborator
Author
|
There's definitely some sub-tasks to this and pieces that we need to break out. One that I'm realizing is that the dumpfile format is hardcoded to sha256-12. I guess we can just auto-detect from length (like we're doing in other places) but the more I think about this the more I feel we need to formalize it (as is argued in #224 ) So how about a magic comment in the dumpfile like or so? |
4d43b61 to
1871128
Compare
Collaborator
Author
|
Let's make the format layout a choice to avoid breaking sealed UKIs as is today |
1871128 to
3ebfcf2
Compare
Extract the FsVerityHashValue trait, Sha256HashValue, and Sha512HashValue types from composefs into a new composefs-types internal crate. This is prep work for extracting erofs code into a separate crate that needs these types without depending on all of composefs. The composefs crate re-exports these types from its existing fsverity::hashvalue module so all downstream code continues to work unchanged. The INLINE_CONTENT_MAX constant is also moved to composefs-types and re-exported. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS on-disk format definitions (format.rs) into a new composefs-erofs crate. The file is moved with git mv to preserve history and is byte-for-byte identical in the new location. The composefs crate re-exports everything via `pub use composefs_erofs::format::*` so all downstream code continues to compile without changes. The four Debug impls for format types (CompactInodeHeader, ExtendedInodeHeader, ComposefsHeader, Superblock) that lived in composefs's debug.rs are moved to composefs-erofs's debug module to satisfy orphan rules. The XAttrHeader::calculate_n_elems inherent method is converted to a local free function in reader.rs for the same reason. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Move the OverlayMetacopy structure from composefs's erofs module into the composefs-erofs crate. This type represents the overlay.metacopy xattr format used for fs-verity digest storage in composefs images. The visibility changes from pub(super) to pub since the type is now in a separate crate and needs to be accessible by both the reader (which will move to composefs-erofs) and the writer (which remains in composefs). The composefs crate re-exports the module for backward compatibility. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS image reader and parser code into the composefs-erofs crate. The reader provides safe parsing of EROFS filesystem images, including inode traversal, directory reading, and object reference collection. The XAttrHeader::calculate_n_elems method is restored as an inherent impl now that XAttrHeader and the reader are in the same crate. Debug impls for reader types (XAttr, Inode, DirectoryBlock, DataBlock) move to composefs-erofs since those types now live there. DirectoryEntry::nid() is made public since the type is now part of the crate's public API. Reader tests that depend on composefs-specific utilities (dumpfile parser, mkfs writer) are extracted to crates/composefs/tests/erofs_reader.rs since they cannot live in composefs-erofs without creating a circular dependency. The composefs crate re-exports composefs_erofs::reader::* for backward compatibility. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS image debug and analysis code into the composefs-erofs crate. This includes debug_img(), dump_unassigned(), the ImageVisitor tree walker, and SegmentType enum, as well as all Debug trait implementations for erofs types. Consolidate the previously-duplicated utility functions (hexdump, utf8_or_hex, addr! macro) into a single copy in composefs-erofs. The composefs crate re-exports composefs_erofs::debug::* for backward compatibility. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
The writer remains in the composefs crate (it depends on tree types that can't be extracted without significant restructuring), but now imports format, reader, and composefs modules directly from the composefs-erofs crate rather than through the re-export layer. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
The erofs-debug binary now depends directly on composefs-erofs rather than going through the composefs crate, since all the erofs debug functionality lives in composefs-erofs. This removes erofs-debug's dependency on the full composefs crate. Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Basically starting on composefs/composefs#423 3 key goals: - Compatible CLI interfaces - Compatible EROFS output format (this is a big deal!) - Next: Compatible C shared library (ugly and messy) Assisted-by: OpenCode (Claude Sonnet 4)
Add tests verifying V1_0 vs V1_1 format differences (composefs_version header, build_time, compact/extended inodes, overlay.opaque xattr), full dump→parse→regenerate pipeline tests, and edge case coverage for xattrs, hardlinks across directories, deeply nested paths, maximum filename length, and all file types in a single directory. Also add the nested case to the C mkcomposefs comparison test, fix clippy warnings (redundant closures, complex type aliases). Assisted-by: OpenCode (Claude claude-opus-4-6)
Add fuzz targets to exercise the EROFS reader and debug image dumper with arbitrary byte inputs. The reader target exercises the full API surface: Image::open, superblock fields, inode header methods, xattr iteration, directory traversal, and collect_objects. Uses catch_unwind strategically so the fuzzer can explore deeper API surfaces even when earlier operations panic on malformed input. The reader currently has many unwrap() calls on untrusted input -- these fuzz targets are designed to surface those as crash bugs to be fixed. Modeled after the cargo-fuzz setup in composefs/tar-core. Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace todo!() panic with bail!() for --threads flag in mkcomposefs, so users get a proper error instead of a crash. Remove catch_unwind in fuzz targets since cargo-fuzz uses panic=abort, making catch_unwind a no-op. The simpler direct-call approach lets the fuzzer surface each panic as a crash finding, which is the intended behavior. Assisted-by: OpenCode (Claude claude-opus-4-6)
The fuzzer immediately found a shift overflow panic in Image::open() when blkszbits from untrusted input exceeds 63. More broadly, the reader had ~18 unwrap()/expect() calls on untrusted data. Convert all reader parsing methods to return Result<T, ReaderError>: - Image::open() validates blkszbits, bounds-checks meta/xattr offsets - inode(), shared_xattr(), block(), directory_block(), data_block() all return Result with bounds checking - XAttr parsing, directory entry iteration return Result - InodeHeader::data_layout() and additional_bytes() return Result - Zero unwrap()/expect() calls remain in the reader All callers updated: debug.rs, dump.rs, composefs-info, repository.rs, test files, and fuzz targets (which now use let-else for early return on invalid images). Assisted-by: OpenCode (Claude claude-opus-4-6)
Convert the assert_eq! in ImageVisitor::note() to return an error instead of panicking when a corrupt image has the same offset visited as two different segment types. Found by the debug_image fuzz target. Assisted-by: OpenCode (Claude claude-opus-4-6)
Add just fuzz, fuzz-all, and fuzz-list targets that run cargo-fuzz against the composefs-erofs crate. Follows the pattern from tar-core. Assisted-by: OpenCode (Claude claude-opus-4-6)
Generate 19 valid EROFS images covering different reader code paths: both format versions (V1_0/V1_1), all file types (regular inline, external, symlink, fifo, chardev, blockdev, socket), nested directories, many-entry directories, xattrs, hardlinks, large inline content, deep nesting, and edge cases like large uid/gid forcing extended inodes. This improves fuzzer effectiveness dramatically: code coverage goes from 19 edges (empty corpus) to 503 edges immediately. Run with `just generate-corpus`. Assisted-by: OpenCode (Claude claude-opus-4-6)
Fix arithmetic operations that could overflow, underflow, or cause resource exhaustion when processing malformed EROFS images: - Use checked_mul instead of unchecked << for block address calculations in debug.rs - Use checked_add for block range end computation in reader.rs to prevent u64 overflow - Use usize::BITS instead of hardcoded 64 for blkszbits validation (correct on 32-bit platforms) - Use usize::try_from instead of 'as usize' casts for inode size, inode ID, and block ID to avoid silent truncation on 32-bit - Cap Vec allocation against image length to prevent OOM from crafted size fields in dump.rs - Add cycle detection and depth limit (512) for directory traversal in dump.rs to prevent stack overflow - Use saturating_sub for debug display calculations Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace direct slice indexing with .get() where the bounds come from image content: XAttr::suffix/value/padding, Inode::inline, and debug_img's unassigned-region slicing. This prevents panics on malformed images where field values are inconsistent with actual data lengths. Assisted-by: OpenCode (Claude claude-opus-4-6)
…pers Change XAttr::suffix(), value(), and padding() to return Result<&[u8], ReaderError> instead of silently returning empty slices on out-of-bounds access. This ensures corrupt xattr data is properly reported rather than silently swallowed. Also deduplicate is_whiteout() (moved to InodeHeader trait method) and find_child_nid() (moved to Image method), and remove the redundant entry_nid() test helper in favor of DirectoryEntry::nid(). Assisted-by: OpenCode (Claude claude-opus-4-6)
Add a GitHub Actions workflow that runs cargo-fuzz on every PR (2 minutes per target) and an extended 15-minute-per-target run on pushes to main. Modeled after the tar-core fuzz CI setup. The extended job depends on the smoke test passing first, and both jobs upload crash artifacts on failure for easy debugging. Assisted-by: OpenCode (Claude claude-opus-4-6)
3ebfcf2 to
8a5c48d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Basically starting on composefs/composefs#423
3 key goals:
Assisted-by: OpenCode (Claude Sonnet 4)