Skip to content

Initial reimplementation of composefs-c#225

Draft
cgwalters wants to merge 19 commits intocomposefs:mainfrom
cgwalters:composefs-c-compat
Draft

Initial reimplementation of composefs-c#225
cgwalters wants to merge 19 commits intocomposefs:mainfrom
cgwalters:composefs-c-compat

Conversation

@cgwalters
Copy link
Collaborator

Basically starting on composefs/composefs#423

3 key goals:

  • Compatible CLI interfaces
  • Compatible EROFS output format (this is a big deal!)
  • Next: Compatible C shared library (ugly and messy)

Assisted-by: OpenCode (Claude Sonnet 4)

@cgwalters
Copy link
Collaborator Author

There's definitely some sub-tasks to this and pieces that we need to break out. One that I'm realizing is that the dumpfile format is hardcoded to sha256-12. I guess we can just auto-detect from length (like we're doing in other places) but the more I think about this the more I feel we need to formalize it (as is argued in #224 )

So how about a magic comment in the dumpfile like

# format: sha512-12

or so?

@cgwalters
Copy link
Collaborator Author

Let's make the format layout a choice to avoid breaking sealed UKIs as is today

cgwalters added 19 commits March 7, 2026 21:50
Extract the FsVerityHashValue trait, Sha256HashValue, and Sha512HashValue
types from composefs into a new composefs-types internal crate. This is
prep work for extracting erofs code into a separate crate that needs
these types without depending on all of composefs.

The composefs crate re-exports these types from its existing
fsverity::hashvalue module so all downstream code continues to work
unchanged. The INLINE_CONTENT_MAX constant is also moved to
composefs-types and re-exported.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS on-disk format definitions (format.rs) into a new
composefs-erofs crate. The file is moved with git mv to preserve
history and is byte-for-byte identical in the new location.

The composefs crate re-exports everything via `pub use
composefs_erofs::format::*` so all downstream code continues to
compile without changes.

The four Debug impls for format types (CompactInodeHeader,
ExtendedInodeHeader, ComposefsHeader, Superblock) that lived in
composefs's debug.rs are moved to composefs-erofs's debug module
to satisfy orphan rules. The XAttrHeader::calculate_n_elems inherent
method is converted to a local free function in reader.rs for the
same reason.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Move the OverlayMetacopy structure from composefs's erofs module into
the composefs-erofs crate. This type represents the overlay.metacopy
xattr format used for fs-verity digest storage in composefs images.

The visibility changes from pub(super) to pub since the type is now
in a separate crate and needs to be accessible by both the reader
(which will move to composefs-erofs) and the writer (which remains
in composefs). The composefs crate re-exports the module for backward
compatibility.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS image reader and parser code into the composefs-erofs
crate. The reader provides safe parsing of EROFS filesystem images,
including inode traversal, directory reading, and object reference
collection.

The XAttrHeader::calculate_n_elems method is restored as an inherent
impl now that XAttrHeader and the reader are in the same crate. Debug
impls for reader types (XAttr, Inode, DirectoryBlock, DataBlock) move
to composefs-erofs since those types now live there.
DirectoryEntry::nid() is made public since the type is now part of the
crate's public API.

Reader tests that depend on composefs-specific utilities (dumpfile
parser, mkfs writer) are extracted to crates/composefs/tests/erofs_reader.rs
since they cannot live in composefs-erofs without creating a circular
dependency. The composefs crate re-exports composefs_erofs::reader::*
for backward compatibility.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Move the EROFS image debug and analysis code into the composefs-erofs
crate. This includes debug_img(), dump_unassigned(), the ImageVisitor
tree walker, and SegmentType enum, as well as all Debug trait
implementations for erofs types.

Consolidate the previously-duplicated utility functions (hexdump,
utf8_or_hex, addr! macro) into a single copy in composefs-erofs.
The composefs crate re-exports composefs_erofs::debug::* for backward
compatibility.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
The writer remains in the composefs crate (it depends on tree types
that can't be extracted without significant restructuring), but now
imports format, reader, and composefs modules directly from the
composefs-erofs crate rather than through the re-export layer.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
The erofs-debug binary now depends directly on composefs-erofs rather
than going through the composefs crate, since all the erofs debug
functionality lives in composefs-erofs. This removes erofs-debug's
dependency on the full composefs crate.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Basically starting on composefs/composefs#423

3 key goals:

- Compatible CLI interfaces
- Compatible EROFS output format (this is a big deal!)
- Next: Compatible C shared library (ugly and messy)

Assisted-by: OpenCode (Claude Sonnet 4)
Add tests verifying V1_0 vs V1_1 format differences (composefs_version
header, build_time, compact/extended inodes, overlay.opaque xattr),
full dump→parse→regenerate pipeline tests, and edge case coverage for
xattrs, hardlinks across directories, deeply nested paths, maximum
filename length, and all file types in a single directory.

Also add the nested case to the C mkcomposefs comparison test, fix
clippy warnings (redundant closures, complex type aliases).

Assisted-by: OpenCode (Claude claude-opus-4-6)
Add fuzz targets to exercise the EROFS reader and debug image dumper
with arbitrary byte inputs. The reader target exercises the full API
surface: Image::open, superblock fields, inode header methods, xattr
iteration, directory traversal, and collect_objects. Uses catch_unwind
strategically so the fuzzer can explore deeper API surfaces even when
earlier operations panic on malformed input.

The reader currently has many unwrap() calls on untrusted input -- these
fuzz targets are designed to surface those as crash bugs to be fixed.

Modeled after the cargo-fuzz setup in composefs/tar-core.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace todo!() panic with bail!() for --threads flag in mkcomposefs,
so users get a proper error instead of a crash.

Remove catch_unwind in fuzz targets since cargo-fuzz uses panic=abort,
making catch_unwind a no-op. The simpler direct-call approach lets the
fuzzer surface each panic as a crash finding, which is the intended
behavior.

Assisted-by: OpenCode (Claude claude-opus-4-6)
The fuzzer immediately found a shift overflow panic in Image::open()
when blkszbits from untrusted input exceeds 63. More broadly, the
reader had ~18 unwrap()/expect() calls on untrusted data.

Convert all reader parsing methods to return Result<T, ReaderError>:
- Image::open() validates blkszbits, bounds-checks meta/xattr offsets
- inode(), shared_xattr(), block(), directory_block(), data_block()
  all return Result with bounds checking
- XAttr parsing, directory entry iteration return Result
- InodeHeader::data_layout() and additional_bytes() return Result
- Zero unwrap()/expect() calls remain in the reader

All callers updated: debug.rs, dump.rs, composefs-info, repository.rs,
test files, and fuzz targets (which now use let-else for early return
on invalid images).

Assisted-by: OpenCode (Claude claude-opus-4-6)
Convert the assert_eq! in ImageVisitor::note() to return an error
instead of panicking when a corrupt image has the same offset visited
as two different segment types. Found by the debug_image fuzz target.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Add just fuzz, fuzz-all, and fuzz-list targets that run cargo-fuzz
against the composefs-erofs crate. Follows the pattern from tar-core.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Generate 19 valid EROFS images covering different reader code paths:
both format versions (V1_0/V1_1), all file types (regular inline,
external, symlink, fifo, chardev, blockdev, socket), nested
directories, many-entry directories, xattrs, hardlinks, large inline
content, deep nesting, and edge cases like large uid/gid forcing
extended inodes.

This improves fuzzer effectiveness dramatically: code coverage goes
from 19 edges (empty corpus) to 503 edges immediately. Run with
`just generate-corpus`.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Fix arithmetic operations that could overflow, underflow, or cause
resource exhaustion when processing malformed EROFS images:

- Use checked_mul instead of unchecked << for block address
  calculations in debug.rs
- Use checked_add for block range end computation in reader.rs to
  prevent u64 overflow
- Use usize::BITS instead of hardcoded 64 for blkszbits validation
  (correct on 32-bit platforms)
- Use usize::try_from instead of 'as usize' casts for inode size,
  inode ID, and block ID to avoid silent truncation on 32-bit
- Cap Vec allocation against image length to prevent OOM from crafted
  size fields in dump.rs
- Add cycle detection and depth limit (512) for directory traversal
  in dump.rs to prevent stack overflow
- Use saturating_sub for debug display calculations

Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace direct slice indexing with .get() where the bounds come from
image content: XAttr::suffix/value/padding, Inode::inline, and
debug_img's unassigned-region slicing. This prevents panics on
malformed images where field values are inconsistent with actual data
lengths.

Assisted-by: OpenCode (Claude claude-opus-4-6)
…pers

Change XAttr::suffix(), value(), and padding() to return
Result<&[u8], ReaderError> instead of silently returning empty
slices on out-of-bounds access. This ensures corrupt xattr data
is properly reported rather than silently swallowed.

Also deduplicate is_whiteout() (moved to InodeHeader trait method)
and find_child_nid() (moved to Image method), and remove the
redundant entry_nid() test helper in favor of DirectoryEntry::nid().

Assisted-by: OpenCode (Claude claude-opus-4-6)
Add a GitHub Actions workflow that runs cargo-fuzz on every PR (2
minutes per target) and an extended 15-minute-per-target run on
pushes to main. Modeled after the tar-core fuzz CI setup.

The extended job depends on the smoke test passing first, and both
jobs upload crash artifacts on failure for easy debugging.

Assisted-by: OpenCode (Claude claude-opus-4-6)
@cgwalters cgwalters force-pushed the composefs-c-compat branch from 3ebfcf2 to 8a5c48d Compare March 7, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant