diff --git a/src/app/docs/protocols/blobs/page.mdx b/src/app/docs/protocols/blobs/page.mdx deleted file mode 100644 index 9a710863..00000000 --- a/src/app/docs/protocols/blobs/page.mdx +++ /dev/null @@ -1,84 +0,0 @@ -import {ThemeImage} from '@/components/ThemeImage' - -export const metadata = { - title: 'blobs', - description: 'iroh-blobs, a protocol for immutable data. Blobs are content-addressed bytes of any size' -} - -## iroh-blobs - -Blobs is a protocol that works with content-addressed _blobs_ of opaque data, which are often the bytes of a file. {{ className: 'lead' }} - -A blob is a sequence of bytes. Those bytes can be a JPEG image, a text file, a video, really anything that is a definite set of bytes. - -All blobs within iroh-blobs are referred to by the BLAKE3 [1] hash of its content. BLAKE3 is a _tree hashing_ algorithm, that splits its input into uniform chunks and arranges them as the leaves of a binary tree, producing intermediate _chunk hashes_ that accumulate up to a _root hash._ Iroh uses the 32 byte root hash (or just “hash”) as an immutable blob identifier. - -
- -
- -Iroh-blobs leverages the tree hash structure of BLAKE3 as a basis for incremental verification when data is sent over the network, as described by Section 6.4 “Verified Streaming” in [1] and implemented in [2]. Iroh-blobs caches all chunk hashes as external metadata, leaving the unaltered input blob as the canonical source of bytes. Verified streaming also facilitates _range requests:_ fetching a verifiable contiguous subsequence of a blob by streaming only the portions of the BLAKE3 binary tree required to verify the designated subsequence. - -Chunk hashes are distinct from root hashes and only used during data transfer. The chunk group size of BAO is a tunable constant that defaults to 1KiB, which results in a 6% overhead on on the size of the input blob. Increasing the chunk size reduces overhead, at the cost of requiring more data transfer before an incremental verification checkpoint is reached. The chunk group size constant can be modified & recalculated *without affecting the root hash.* This opens the door to experiment with different chunk group size constants, even at runtime. We intend to investigate chunk size optimization in future work. - -Root hashes are expressed as a Content identifier (CID) as defined in [3], making iroh-blobs an *IPFS system* capable of interoperating with other systems that use CIDs. In contrast to other IPFS systems, only root hashes are valid content identifiers, which enforces a strict 1-1 relationship between a Content Identifier and blob. This 1-1 relationship brings iroh-blobs into alignment with common whole-file checksum systems. A naive implementation of iroh-blobs can skip verified streaming entirely and use the the CID as a whole-file checksum. - -## Collections - -A _collection_ is an ordered set of blob hashes. {{ className: 'lead' }} - -Iroh-blobs uses collections as immutable, ordered lists of blobs. Collections themselves are blobs, serialized as a _hash sequence_ of one hash after another, with no separators or headers. Because all hashes in iroh-blobs are 32-byte BLAKE3 hashes, the byte length of a collection will always be a multiple of 32. - -
- -
- -Collection link counts can range from 0-billions. Collections are true lists, and should not be nested to form graphs. While it's totally possible to put the hash of a collection within a collection, the internal garbage collector within iroh-blobs that keeps track of what blobs can be deleted does not check this, and will prune away data that isn't explicitly known to the garbage collector. - - - Collections and [documents](/docs/layers/documents) can both be used to group blobs together. The core difference is collections are _immutable_, while documents are _mutable_. - - -### Collection Metadata - -Formally, the serialized list of hashes stored as a blob is a _hash sequence_, which has no metadata. Iroh-blobs defines a collection as a hash sequence who's first element points to a `CollectionV0` metadata blob. The metadata blob is always starts with the `CollectionV0` UTF-8 string, followed by a list of strings that are the names of the links in the collection. For example, a collection with the links `foo`, `bar`, and `baz` would look like this: - -``` -"CollectionV0" -["foo", "bar", "baz", ...] -``` - -The length of the list must match the length of elements in the hash sequence (minus the metadata element). The metadata blob is always the first element of the hash sequence. Iroh-blobs can issue sparse requests to determine the byte lengths of each blob in the collection, which combines to give a baseline metadata of "file names" and sizes. This provides a few nice advantages: - -1. There is nothing left to remove from the definition of a hash sequence. We consider this specification finished. -2. The "metadata as first element" is a convention. It's completely acceptable to build a custom collection definition that includes different metadata. Iroh-blobs will still understand how to transfer & seek into the hash sequence. This opens the door to building efficient, specialized, & immutable compound data structures on iroh-blobs. -3. BLAKE3 chunks along 1Kib Boundaries, which means there will always be exactly 32 hashes in a BLAKE3 incremental verification block. - -### When to use a collection - -Collections are the right tool to reach for when you need a "snapshot" of a set of blobs. For example, a collection is a good way to represent a directory of files. Collections are not the right tool if the data you're working with is changing. For mutable, named grouping, use [documents](/docs/layers/documents). - -** ** - -## References - -1. **blake3**
-[https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) -2. **bao**
-[https://github.com/oconnor663/bao](https://github.com/oconnor663/bao) -3. **CID (Content IDentifier): Self-describing content-addressed identifiers for distributed systems**
-[https://github.com/multiformats/cid](https://github.com/multiformats/cid) -4. **multihashes**
-[https://multiformats.io/multihash/](https://multiformats.io/multihash/)