Skip to content

feat: Use FastCDC for content-defined chunking #1

@andrewgazelka

Description

@andrewgazelka

Problem

The current delta sync uses fixed 4KB blocks (rsync-style). This works well when files are modified in-place, but performs poorly when content shifts:

Old: [AAAA][BBBB][CCCC][DDDD]
New: [XXAA][AABB][BBCC][CCDD][DD__]  (inserted "XX" at start)

With fixed blocks, inserting 2 bytes at the start causes every block to be different, requiring a full retransfer.

Solution

Use FastCDC (Fast Content-Defined Chunking) instead of fixed blocks. CDC finds chunk boundaries based on content, so insertions/deletions only affect nearby chunks:

Old: [AAA|BBBB|CCC|DDDD]  (boundaries based on content)
New: [XX|AAA|BBBB|CCC|DDDD]  (only first chunk is new)

Implementation

The fastcdc crate provides a production-ready implementation:

use fastcdc::v2020::FastCDC;

let chunker = FastCDC::new(&data, 2048, 4096, 16384); // min, avg, max
for chunk in chunker {
    // chunk.offset, chunk.length, chunk.hash
}

Benefits

  • Much better delta compression for files with insertions/deletions
  • Same algorithm used by restic, borg, and other dedup tools
  • Minimal performance overhead (FastCDC is designed to be fast)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions