Skip to content

Latest commit

Β 

History

History
771 lines (617 loc) Β· 35.1 KB

File metadata and controls

771 lines (617 loc) Β· 35.1 KB

File System Architecture

Table of Contents

  1. Overview
  2. System Layers
  3. Data Encoding & Serialization
  4. Content Addressing & CIDs
  5. Security Model
  6. Encryption & Access Control
  7. Blockchain Integration
  8. Design Decisions
  9. Performance Considerations
  10. API Documentation Links

Overview

The File System Interface (Layer 1) provides a high-level abstraction over Scalable Web3 Storage (Layer 0), enabling users to work with familiar file system concepts while benefiting from decentralized, content-addressed storage with blockchain accountability.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Applications                                                   β”‚
β”‚  (Web apps, CLI tools, FUSE mounts)                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–²
                              β”‚ File System Client SDK
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 1: File System Interface                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ Drive        β”‚  β”‚ File System    β”‚  β”‚ File System     β”‚         β”‚
β”‚  β”‚ Registry     β”‚  β”‚ Primitives     β”‚  β”‚ Client SDK      β”‚         β”‚
β”‚  β”‚ (On-Chain)   β”‚  β”‚ (Types)        β”‚  β”‚ (Off-Chain)     β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–²
                              β”‚ Bucket/Agreement APIs
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 0: Scalable Web3 Storage                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ Storage      β”‚  β”‚ Provider       β”‚  β”‚ Storage         β”‚         β”‚
β”‚  β”‚ Pallet       β”‚  β”‚ Node           β”‚  β”‚ Client          β”‚         β”‚
β”‚  β”‚ (On-Chain)   β”‚  β”‚ (Off-Chain)    β”‚  β”‚ (Off-Chain)     β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

System Layers

Layer 0: Scalable Web3 Storage (Foundation)

Purpose: Provides raw blob storage with game-theoretic guarantees.

Components:

  • Storage Pallet: On-chain logic for buckets, agreements, checkpoints, and challenges
  • Provider Node: Off-chain HTTP server storing data chunks and building MMR commitments
  • Storage Client: SDK for bucket operations, uploads, downloads, and verification

Key Concepts:

  • Buckets: Logical containers for data with associated provider agreements
  • Agreements: Contracts between users and providers specifying storage terms
  • Checkpoints: Cryptographic commitments (MMR roots) submitted on-chain
  • Challenges: Mechanism for verifying provider data integrity

Layer 1: File System Interface (Abstraction)

Purpose: Provides familiar file/folder interface over Layer 0's content-addressed blob storage.

Components:

  • Drive Registry Pallet: On-chain drive metadata and root CID tracking
  • File System Primitives: Shared types (DirectoryNode, FileManifest, CommitStrategy)
  • File System Client: High-level SDK for file/directory operations

Key Concepts:

  • Drives: User's logical file systems backed by Layer 0 buckets
  • Root CID: Content identifier of the root directory (stored on-chain)
  • Directory Nodes: Protobuf/SCALE-encoded directory structures
  • File Manifests: Metadata tracking file chunks

Parachain Integration

Both Layer 0 and Layer 1 operate on the same parachain:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Storage Parachain (ID: 4000)                                        β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ pallet-storage-provider     β”‚  β”‚ pallet-drive-registry       β”‚  β”‚
β”‚  β”‚ (Layer 0)                   β”‚  β”‚ (Layer 1)                   β”‚  β”‚
β”‚  β”‚                             β”‚  β”‚                             β”‚  β”‚
β”‚  β”‚ - Buckets                   β”‚  β”‚ - Drives                    β”‚  β”‚
β”‚  β”‚ - Agreements                β”‚  β”‚ - Root CIDs                 β”‚  β”‚
β”‚  β”‚ - Checkpoints               β”‚  β”‚ - User registry             β”‚  β”‚
β”‚  β”‚ - Challenges                β”‚  β”‚                             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                      β”‚
β”‚  Cross-Pallet Calls: DriveRegistry β†’ StorageProvider                β”‚
β”‚  (create_bucket, request_agreement, end_agreement)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β”‚ Cumulus (Parachain Protocol)
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Relay Chain (Polkadot/Paseo)                                        β”‚
β”‚  - Shared security                                                   β”‚
β”‚  - Finality                                                          β”‚
β”‚  - Cross-chain messaging (future)                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Same Parachain?

  1. Lower Latency: Cross-pallet calls are atomic and synchronous
  2. Simpler Architecture: No XCM messaging complexity
  3. Shared State: Direct access to Layer 0 storage (buckets, agreements)
  4. Cost Efficiency: Single transaction for drive creation + bucket setup

Data Encoding & Serialization

The system uses two encoding formats depending on the context:

SCALE Encoding (On-Chain & Content-Addressed Storage)

Usage:

  • All on-chain storage (pallet state)
  • Content-addressed data stored via providers
  • CID computation base

Why SCALE?

  • Substrate-native encoding (required for pallets)
  • Deterministic: Same data always produces same bytes
  • Efficient: Compact binary representation
  • no_std compatible: Works in runtime WASM

Format Details:

// DirectoryNode SCALE encoding
struct DirectoryNode {
    drive_id: u64,                                    // 8 bytes, little-endian
    children: BoundedVec<DirectoryEntry, Max1024>,    // Length prefix + entries
    metadata: BoundedVec<MetadataEntry, Max64>,       // Length prefix + entries
}

// DirectoryEntry SCALE encoding
struct DirectoryEntry {
    name: BoundedVec<u8, Max256>,    // Length prefix + UTF-8 bytes
    entry_type: EntryType,            // 1 byte (0=File, 1=Directory)
    cid: H256,                        // 32 bytes (blake2-256 hash)
    size: u64,                        // 8 bytes, little-endian
    mtime: u64,                       // 8 bytes, Unix timestamp
}

Example: Empty DirectoryNode for drive_id=2

Bytes:   02 00 00 00 00 00 00 00  00  00
         └─────── drive_id β”€β”€β”€β”€β”€β”€β”€β”˜  └── children (empty vec)
                                        └── metadata (empty vec)
Length: 10 bytes
CID: 0xe835d9bb4ac2c42bd8895fcfb159903f4ce6de8de863182f4fb87c06a23d18b7

Protobuf Encoding (Optional Off-Chain)

Usage:

  • Client-side caching (optional)
  • Inter-service communication
  • Human-readable debugging

Why Protobuf?

  • Self-describing schema
  • Language-agnostic
  • Better tooling for inspection

Important: Protobuf is NOT used for CID computation. CIDs are always computed from SCALE-encoded bytes to ensure consistency.

Encoding Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client Operations                                                   β”‚
β”‚                                                                      β”‚
β”‚  1. Create DirectoryNode struct                                      β”‚
β”‚  2. Serialize to SCALE: node.to_scale_bytes()                       β”‚
β”‚  3. Compute CID: blake2_256(scale_bytes)                            β”‚
β”‚  4. Upload SCALE bytes to provider (by CID)                         β”‚
β”‚  5. Store CID on-chain (root_cid)                                   β”‚
β”‚                                                                      β”‚
β”‚  Retrieval:                                                          β”‚
β”‚  1. Read root_cid from chain                                         β”‚
β”‚  2. Fetch SCALE bytes from provider (by CID)                        β”‚
β”‚  3. Verify: blake2_256(bytes) == expected_cid                       β”‚
β”‚  4. Deserialize: DirectoryNode::from_scale_bytes(&bytes)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Content Addressing & CIDs

CID Format

Content Identifiers (CIDs) are 32-byte blake2-256 hashes:

pub type Cid = H256;  // sp_core::H256

pub fn compute_cid(data: &[u8]) -> Cid {
    sp_core::hashing::blake2_256(data).into()
}

Why blake2-256?

  1. Substrate Standard: Native hashing function in Substrate
  2. Performance: Faster than SHA-256 while equally secure
  3. Collision Resistance: 256-bit output provides strong guarantees
  4. Hardware Support: Optimized implementations available

Content-Addressed DAG

Files and directories form a Merkle DAG (Directed Acyclic Graph):

                    Root CID (on-chain)
                         β”‚
                    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
                    β”‚         β”‚
               documents/   images/
                    β”‚         β”‚
              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”   β”‚
              β”‚           β”‚   β”‚
          work/     notes.txt photo.jpg
              β”‚
          report.txt

Each node's CID = blake2_256(SCALE_bytes)
Parent nodes contain children's CIDs

Deduplication

Same content always produces same CID, enabling automatic deduplication:

// Two identical files
let file1_data = b"Hello, World!";
let file2_data = b"Hello, World!";

let cid1 = compute_cid(file1_data);  // 0xabc...
let cid2 = compute_cid(file2_data);  // 0xabc... (same!)

// Only stored once on provider

Security Model

Trust Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Trust Levels                                                       β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ TRUSTLESS: Blockchain                                        β”‚  β”‚
β”‚  β”‚ - Finalized state is immutable                               β”‚  β”‚
β”‚  β”‚ - Consensus guarantees                                       β”‚  β”‚
β”‚  β”‚ - Root CIDs are verifiable                                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ VERIFIABLE: Content-Addressed Storage                        β”‚  β”‚
β”‚  β”‚ - Data integrity verified by CID                             β”‚  β”‚
β”‚  β”‚ - Cannot serve tampered data                                 β”‚  β”‚
β”‚  β”‚ - Providers economically incentivized                        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ ACCOUNTABLE: Provider Network                                β”‚  β”‚
β”‚  β”‚ - Staked providers face slashing                             β”‚  β”‚
β”‚  β”‚ - Challenge mechanism for disputes                           β”‚  β”‚
β”‚  β”‚ - Replication for redundancy                                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Integrity Verification

Every data retrieval is verified:

async fn fetch_blob(&self, cid: Cid) -> Result<Vec<u8>> {
    // 1. Fetch data from provider
    let data = self.storage_client.read(&cid, 0, length).await?;

    // 2. Provider verifies chunk hashes during read
    // (see storage-client/src/lib.rs lines 221-227)

    // 3. Client verifies entire blob CID
    let actual_cid = compute_cid(&data);
    if actual_cid != cid {
        return Err(Error::IntegrityCheckFailed);
    }

    Ok(data)
}

Provider Accountability

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Game-Theoretic Guarantees                                          β”‚
β”‚                                                                     β”‚
β”‚  Provider Registration:                                             β”‚
β”‚  - Minimum stake: 1000 tokens                                       β”‚
β”‚  - Stake locked during active agreements                            β”‚
β”‚                                                                     β”‚
β”‚  Checkpoint Flow:                                                   β”‚
β”‚  1. Provider builds MMR over stored data                            β”‚
β”‚  2. Provider signs commitment (MMR root)                            β”‚
β”‚  3. Client submits checkpoint on-chain                              β”‚
β”‚  4. Provider is now liable for data availability                    β”‚
β”‚                                                                     β”‚
β”‚  Challenge Mechanism:                                               β”‚
β”‚  1. Challenger requests proof for specific chunk                    β”‚
β”‚  2. Provider must respond within challenge_period                   β”‚
β”‚  3. Failure to respond β†’ slashing (lose stake)                      β”‚
β”‚  4. Successful response β†’ challenger pays challenge fee             β”‚
β”‚                                                                     β”‚
β”‚  Result: Providers economically motivated to preserve data          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Access Control

Current State: Basic owner-based access

// Only drive owner can modify
fn update_root_cid(origin, drive_id, new_root_cid) {
    let caller = ensure_signed(origin)?;
    let drive = Drives::<T>::get(drive_id)?;
    ensure!(drive.owner == caller, Error::NotDriveOwner);
    // ... proceed with update
}

Future Enhancements: See Encryption & Access Control


Encryption & Access Control

Current State

Encryption is NOT implemented by default. Data is stored in plaintext.

The system provides infrastructure for future encryption:

pub struct FileManifest {
    // ... other fields
    /// Encryption parameters (optional, for W3ACL)
    pub encryption_params: BoundedVec<u8, MaxEncryptionParamsLength>,  // 512 bytes max
}

Planned Encryption Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client-Side Encryption (Planned)                                   β”‚
β”‚                                                                     β”‚
β”‚  Upload:                                                            β”‚
β”‚  1. Generate random encryption key (AES-256-GCM)                    β”‚
β”‚  2. Encrypt file chunks with key                                    β”‚
β”‚  3. Encrypt key with owner's public key                             β”‚
β”‚  4. Store encrypted_key in FileManifest.encryption_params           β”‚
β”‚  5. Upload encrypted chunks                                         β”‚
β”‚                                                                     β”‚
β”‚  Download:                                                          β”‚
β”‚  1. Fetch FileManifest                                              β”‚
β”‚  2. Decrypt key with owner's private key                            β”‚
β”‚  3. Fetch and decrypt chunks                                        β”‚
β”‚                                                                     β”‚
β”‚  Sharing:                                                           β”‚
β”‚  1. Decrypt key with owner's private key                            β”‚
β”‚  2. Re-encrypt key with recipient's public key                      β”‚
β”‚  3. Create access grant (UCAN or W3ACL)                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Access Control Roadmap

Feature Status Description
Owner-only access Implemented Drive owner can read/write
Client-side encryption Planned AES-256-GCM per file
UCAN delegation Planned Capability-based access tokens
W3ACL integration Planned Decentralized access control lists
Shared drives Planned Multi-user drive access

Security Recommendations

For Sensitive Data (Current Workaround):

// Encrypt before upload
let key = generate_aes_key();
let encrypted_data = aes_gcm_encrypt(&file_data, &key);
let nonce = get_nonce_from_encryption();

// Store key securely (e.g., in your app's keystore)
fs_client.upload_file(drive_id, "/secret.enc", &encrypted_data, bucket_id).await?;

// Decrypt after download
let encrypted = fs_client.download_file(drive_id, "/secret.enc").await?;
let plaintext = aes_gcm_decrypt(&encrypted, &key, &nonce);

Blockchain Integration

Subxt Connection

The File System Client uses subxt for trustless blockchain interaction:

pub struct SubstrateClient {
    api: OnlineClient<SubstrateConfig>,
    signer: Option<Keypair>,
}

impl SubstrateClient {
    pub async fn connect(endpoint: &str) -> Result<Self> {
        // Connect to parachain WebSocket
        let api = OnlineClient::from_url(endpoint).await?;
        Ok(Self { api, signer: None })
    }
}

Transaction Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Drive Creation Transaction Flow                                    β”‚
β”‚                                                                     β”‚
β”‚  1. Client builds extrinsic:                                        β”‚
β”‚     DriveRegistry::create_drive(name, capacity, period, payment)    β”‚
β”‚                                                                     β”‚
β”‚  2. Client signs with SR25519 keypair                               β”‚
β”‚                                                                     β”‚
β”‚  3. Submit to parachain:                                            β”‚
β”‚     POST /transaction                                               β”‚
β”‚                                                                     β”‚
β”‚  4. Transaction included in block                                   β”‚
β”‚                                                                     β”‚
β”‚  5. Client watches for finalization:                                β”‚
β”‚     - Poll transaction status                                       β”‚
β”‚     - Wait for finality (relay chain confirmation)                  β”‚
β”‚                                                                     β”‚
β”‚  6. Extract drive_id from DriveCreated event                        β”‚
β”‚                                                                     β”‚
β”‚  7. Query drive state:                                              β”‚
β”‚     DriveRegistry::Drives(drive_id) -> DriveInfo                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Storage Queries

// Query drive info
async fn query_drive_root_cid(&self, drive_id: DriveId) -> Result<Cid> {
    // Build storage key: twox128("DriveRegistry") + twox128("Drives") + blake2_128(drive_id)
    let storage_key = build_storage_key("DriveRegistry", "Drives", drive_id);

    // Fetch raw bytes from chain state
    let bytes = self.api.storage().at_latest().await?.fetch_raw(storage_key).await?;

    // Decode DriveInfo and extract root_cid
    let drive_info = decode_drive_info(&bytes)?;
    Ok(drive_info.root_cid)
}

Event Extraction

// Find DriveCreated event after transaction
for event in events.iter() {
    if event.pallet_name() == "DriveRegistry" {
        if let Ok(value) = event.field_values() {
            // DriveCreated { drive_id, owner, bucket_id, root_cid }
            if let Some(drive_id) = value.at(0).and_then(|v| v.as_u128()) {
                return Ok(drive_id as DriveId);
            }
        }
    }
}

Design Decisions

Why SCALE over Protobuf for Storage?

Aspect SCALE Protobuf
Determinism Guaranteed Field order dependent
CID Stability Always same for same data Schema changes break CIDs
Substrate Integration Native Requires conversion
no_std Support Yes Requires prost with alloc
Size Compact Slightly larger

Decision: Use SCALE for all stored data to ensure CID consistency.

Why Same Parachain for L0 and L1?

Alternatives Considered:

  1. Separate Parachains: L0 and L1 on different parachains

    • Pro: Independent scaling
    • Con: XCM complexity, latency, higher costs
  2. L1 on Relay Chain: Drive registry on relay chain

    • Pro: Higher security
    • Con: Limited functionality, high costs
  3. Same Parachain (Chosen):

    • Pro: Simple cross-pallet calls, shared state, low latency
    • Con: Coupled scaling

Rationale: Simplicity wins. File system operations frequently need bucket/agreement data. Cross-pallet calls are atomic and free.

Why blake2-256 for CIDs?

Alternatives:

  • SHA-256: Slower, no substrate optimization
  • Keccak-256: Ethereum-compatible but not Substrate-native
  • BLAKE3: Newer, not yet in Substrate

Decision: blake2-256 is Substrate-native, fast, and battle-tested.

Why Content-Addressed Storage?

Benefits:

  1. Integrity: CID = fingerprint of content
  2. Deduplication: Same content stored once
  3. Immutability: CIDs never change
  4. Verifiability: Anyone can verify data integrity
  5. Caching: Safe to cache forever

Trade-off: Updates create new CIDs, requiring DAG updates.

Why Merkle DAG for Directories?

Benefits:

  1. Efficient Updates: Only changed nodes need re-upload
  2. Versioning: Each root CID is a complete snapshot
  3. Partial Sync: Download only needed branches
  4. Proof of Inclusion: Merkle proofs for any entry

Performance Considerations

Read Path Optimization

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Read Path: download_file("/documents/report.pdf")                  β”‚
β”‚                                                                     β”‚
β”‚  1. Check root_cid cache (in-memory)                               β”‚
β”‚     └─ Hit: Skip chain query                                       β”‚
β”‚     └─ Miss: Query chain, cache result                             β”‚
β”‚                                                                     β”‚
β”‚  2. Traverse path: / β†’ documents β†’ report.pdf                       β”‚
β”‚     └─ Each step: Fetch directory node from provider                β”‚
β”‚     └─ Optimization: Batch fetches, prefetch siblings              β”‚
β”‚                                                                     β”‚
β”‚  3. Fetch file manifest                                             β”‚
β”‚                                                                     β”‚
β”‚  4. Fetch chunks in parallel                                        β”‚
β”‚     └─ Provider supports range requests                             β”‚
β”‚     └─ Client reassembles locally                                   β”‚
β”‚                                                                     β”‚
β”‚  Typical latency:                                                   β”‚
β”‚  - Cache hit: ~50ms (single provider round-trip)                   β”‚
β”‚  - Cache miss: ~200ms (chain query + provider)                     β”‚
β”‚  - Large file: Dominated by chunk download time                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Write Path Optimization

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Write Path: upload_file("/documents/report.pdf", data)             β”‚
β”‚                                                                     β”‚
β”‚  1. Split file into 256 KiB chunks                                  β”‚
β”‚                                                                     β”‚
β”‚  2. Upload chunks in parallel                                       β”‚
β”‚     └─ Each chunk: Compute CID, upload to provider                  β”‚
β”‚     └─ Provider stores: CID β†’ data                                  β”‚
β”‚                                                                     β”‚
β”‚  3. Create FileManifest with chunk CIDs                             β”‚
β”‚     └─ Upload manifest, get manifest CID                            β”‚
β”‚                                                                     β”‚
β”‚  4. Update parent directory                                         β”‚
β”‚     └─ Fetch current directory                                      β”‚
β”‚     └─ Add entry: name β†’ manifest CID                               β”‚
β”‚     └─ Upload new directory, get new CID                            β”‚
β”‚                                                                     β”‚
β”‚  5. Update ancestors up to root                                     β”‚
β”‚     └─ Recursive: Each parent gets new CID                          β”‚
β”‚                                                                     β”‚
β”‚  6. Update on-chain root_cid                                        β”‚
β”‚     └─ Based on CommitStrategy:                                     β”‚
β”‚        - Immediate: Submit transaction now                          β”‚
β”‚        - Batched: Queue, submit on interval                         β”‚
β”‚        - Manual: Store pending, wait for commit_changes()           β”‚
β”‚                                                                     β”‚
β”‚  Optimization: Batch multiple writes before chain update            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Provider API Read Limits

Important: When reading data from providers, avoid u64::MAX as length parameter:

// BAD: Causes overflow in provider's chunk calculation
let data = storage_client.read(&cid, 0, u64::MAX).await?;

// GOOD: Use reasonable maximum (1 TiB)
const MAX_READ_LENGTH: u64 = 1024 * 1024 * 1024 * 1024;
let data = storage_client.read(&cid, 0, MAX_READ_LENGTH).await?;

Reason: Provider calculates end_chunk = (offset + length + chunk_size - 1) / chunk_size. With u64::MAX, this overflows and returns no chunks.


API Documentation Links

User Documentation

Document Description
User Guide Complete guide for end users
Example Walkthrough Step-by-step basic_usage.rs walkthrough

Administrator Documentation

Document Description
Admin Guide System administration and monitoring

Developer Documentation

Document Description
API Reference Complete API documentation
File System Interface Architecture overview

Layer 0 Documentation

Document Description
Extrinsics Reference Layer 0 blockchain API
Payment Calculator Calculate storage costs
Quick Start Get running in 5 minutes

Design Documents

Document Description
Scalable Web3 Storage Design System design & rationale
Implementation Details Technical specifications

Appendix: Encoding Examples

DirectoryNode with Children

let dir = DirectoryNode {
    drive_id: 5,
    children: vec![
        DirectoryEntry {
            name: "documents",
            entry_type: Directory,
            cid: 0x9955e72d...,
            size: 0,
            mtime: 1707456000,
        },
        DirectoryEntry {
            name: "README.md",
            entry_type: File,
            cid: 0x0bc42ff7...,
            size: 127,
            mtime: 1707456020,
        },
    ],
    metadata: vec![],
};

// SCALE encoding (184 bytes for this example):
// 05 00 00 00 00 00 00 00    // drive_id: 5
// 0c                          // children count: 3 (compact)
// 24                          // name length: 9 (compact)
// 64 6f 63 75 6d 65 6e 74 73  // "documents"
// 01                          // entry_type: Directory
// 99 55 e7 2d ...             // cid: 32 bytes
// 00 00 00 00 00 00 00 00    // size: 0
// 05 08 28 96 90 00 00 00    // mtime
// ...

FileManifest with Chunks

let manifest = FileManifest {
    drive_id: 5,
    mime_type: "application/pdf",
    total_size: 1048576,  // 1 MiB
    chunks: vec![
        FileChunk { cid: 0xabc..., sequence: 0 },
        FileChunk { cid: 0xdef..., sequence: 1 },
        FileChunk { cid: 0x123..., sequence: 2 },
        FileChunk { cid: 0x456..., sequence: 3 },
    ],
    encryption_params: vec![],  // Empty (no encryption)
};

Glossary

Term Definition
CID Content Identifier - blake2-256 hash of data
DAG Directed Acyclic Graph - tree structure of CIDs
Drive User's logical file system (Layer 1 concept)
Bucket Storage container (Layer 0 concept)
MMR Merkle Mountain Range - efficient append-only commitment
SCALE Simple Concatenated Aggregate Little-Endian encoding
Checkpoint On-chain commitment to off-chain data state
Root CID CID of the root directory (stored on-chain)

Last updated: February 2026