- Overview
- System Layers
- Data Encoding & Serialization
- Content Addressing & CIDs
- Security Model
- Encryption & Access Control
- Blockchain Integration
- Design Decisions
- Performance Considerations
- API Documentation Links
The File System Interface (Layer 1) provides a high-level abstraction over Scalable Web3 Storage (Layer 0), enabling users to work with familiar file system concepts while benefiting from decentralized, content-addressed storage with blockchain accountability.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Applications β
β (Web apps, CLI tools, FUSE mounts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β File System Client SDK
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: File System Interface β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββ β
β β Drive β β File System β β File System β β
β β Registry β β Primitives β β Client SDK β β
β β (On-Chain) β β (Types) β β (Off-Chain) β β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β Bucket/Agreement APIs
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 0: Scalable Web3 Storage β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββ β
β β Storage β β Provider β β Storage β β
β β Pallet β β Node β β Client β β
β β (On-Chain) β β (Off-Chain) β β (Off-Chain) β β
β ββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Purpose: Provides raw blob storage with game-theoretic guarantees.
Components:
- Storage Pallet: On-chain logic for buckets, agreements, checkpoints, and challenges
- Provider Node: Off-chain HTTP server storing data chunks and building MMR commitments
- Storage Client: SDK for bucket operations, uploads, downloads, and verification
Key Concepts:
- Buckets: Logical containers for data with associated provider agreements
- Agreements: Contracts between users and providers specifying storage terms
- Checkpoints: Cryptographic commitments (MMR roots) submitted on-chain
- Challenges: Mechanism for verifying provider data integrity
Purpose: Provides familiar file/folder interface over Layer 0's content-addressed blob storage.
Components:
- Drive Registry Pallet: On-chain drive metadata and root CID tracking
- File System Primitives: Shared types (DirectoryNode, FileManifest, CommitStrategy)
- File System Client: High-level SDK for file/directory operations
Key Concepts:
- Drives: User's logical file systems backed by Layer 0 buckets
- Root CID: Content identifier of the root directory (stored on-chain)
- Directory Nodes: Protobuf/SCALE-encoded directory structures
- File Manifests: Metadata tracking file chunks
Both Layer 0 and Layer 1 operate on the same parachain:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Storage Parachain (ID: 4000) β
β β
β βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β pallet-storage-provider β β pallet-drive-registry β β
β β (Layer 0) β β (Layer 1) β β
β β β β β β
β β - Buckets β β - Drives β β
β β - Agreements β β - Root CIDs β β
β β - Checkpoints β β - User registry β β
β β - Challenges β β β β
β βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β
β Cross-Pallet Calls: DriveRegistry β StorageProvider β
β (create_bucket, request_agreement, end_agreement) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Cumulus (Parachain Protocol)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Relay Chain (Polkadot/Paseo) β
β - Shared security β
β - Finality β
β - Cross-chain messaging (future) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Same Parachain?
- Lower Latency: Cross-pallet calls are atomic and synchronous
- Simpler Architecture: No XCM messaging complexity
- Shared State: Direct access to Layer 0 storage (buckets, agreements)
- Cost Efficiency: Single transaction for drive creation + bucket setup
The system uses two encoding formats depending on the context:
Usage:
- All on-chain storage (pallet state)
- Content-addressed data stored via providers
- CID computation base
Why SCALE?
- Substrate-native encoding (required for pallets)
- Deterministic: Same data always produces same bytes
- Efficient: Compact binary representation
no_stdcompatible: Works in runtime WASM
Format Details:
// DirectoryNode SCALE encoding
struct DirectoryNode {
drive_id: u64, // 8 bytes, little-endian
children: BoundedVec<DirectoryEntry, Max1024>, // Length prefix + entries
metadata: BoundedVec<MetadataEntry, Max64>, // Length prefix + entries
}
// DirectoryEntry SCALE encoding
struct DirectoryEntry {
name: BoundedVec<u8, Max256>, // Length prefix + UTF-8 bytes
entry_type: EntryType, // 1 byte (0=File, 1=Directory)
cid: H256, // 32 bytes (blake2-256 hash)
size: u64, // 8 bytes, little-endian
mtime: u64, // 8 bytes, Unix timestamp
}Example: Empty DirectoryNode for drive_id=2
Bytes: 02 00 00 00 00 00 00 00 00 00
ββββββββ drive_id ββββββββ βββ children (empty vec)
βββ metadata (empty vec)
Length: 10 bytes
CID: 0xe835d9bb4ac2c42bd8895fcfb159903f4ce6de8de863182f4fb87c06a23d18b7
Usage:
- Client-side caching (optional)
- Inter-service communication
- Human-readable debugging
Why Protobuf?
- Self-describing schema
- Language-agnostic
- Better tooling for inspection
Important: Protobuf is NOT used for CID computation. CIDs are always computed from SCALE-encoded bytes to ensure consistency.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Operations β
β β
β 1. Create DirectoryNode struct β
β 2. Serialize to SCALE: node.to_scale_bytes() β
β 3. Compute CID: blake2_256(scale_bytes) β
β 4. Upload SCALE bytes to provider (by CID) β
β 5. Store CID on-chain (root_cid) β
β β
β Retrieval: β
β 1. Read root_cid from chain β
β 2. Fetch SCALE bytes from provider (by CID) β
β 3. Verify: blake2_256(bytes) == expected_cid β
β 4. Deserialize: DirectoryNode::from_scale_bytes(&bytes) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Content Identifiers (CIDs) are 32-byte blake2-256 hashes:
pub type Cid = H256; // sp_core::H256
pub fn compute_cid(data: &[u8]) -> Cid {
sp_core::hashing::blake2_256(data).into()
}- Substrate Standard: Native hashing function in Substrate
- Performance: Faster than SHA-256 while equally secure
- Collision Resistance: 256-bit output provides strong guarantees
- Hardware Support: Optimized implementations available
Files and directories form a Merkle DAG (Directed Acyclic Graph):
Root CID (on-chain)
β
ββββββ΄βββββ
β β
documents/ images/
β β
βββββββ΄ββββββ β
β β β
work/ notes.txt photo.jpg
β
report.txt
Each node's CID = blake2_256(SCALE_bytes)
Parent nodes contain children's CIDs
Same content always produces same CID, enabling automatic deduplication:
// Two identical files
let file1_data = b"Hello, World!";
let file2_data = b"Hello, World!";
let cid1 = compute_cid(file1_data); // 0xabc...
let cid2 = compute_cid(file2_data); // 0xabc... (same!)
// Only stored once on providerββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Trust Levels β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TRUSTLESS: Blockchain β β
β β - Finalized state is immutable β β
β β - Consensus guarantees β β
β β - Root CIDs are verifiable β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β VERIFIABLE: Content-Addressed Storage β β
β β - Data integrity verified by CID β β
β β - Cannot serve tampered data β β
β β - Providers economically incentivized β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ACCOUNTABLE: Provider Network β β
β β - Staked providers face slashing β β
β β - Challenge mechanism for disputes β β
β β - Replication for redundancy β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Every data retrieval is verified:
async fn fetch_blob(&self, cid: Cid) -> Result<Vec<u8>> {
// 1. Fetch data from provider
let data = self.storage_client.read(&cid, 0, length).await?;
// 2. Provider verifies chunk hashes during read
// (see storage-client/src/lib.rs lines 221-227)
// 3. Client verifies entire blob CID
let actual_cid = compute_cid(&data);
if actual_cid != cid {
return Err(Error::IntegrityCheckFailed);
}
Ok(data)
}ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Game-Theoretic Guarantees β
β β
β Provider Registration: β
β - Minimum stake: 1000 tokens β
β - Stake locked during active agreements β
β β
β Checkpoint Flow: β
β 1. Provider builds MMR over stored data β
β 2. Provider signs commitment (MMR root) β
β 3. Client submits checkpoint on-chain β
β 4. Provider is now liable for data availability β
β β
β Challenge Mechanism: β
β 1. Challenger requests proof for specific chunk β
β 2. Provider must respond within challenge_period β
β 3. Failure to respond β slashing (lose stake) β
β 4. Successful response β challenger pays challenge fee β
β β
β Result: Providers economically motivated to preserve data β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current State: Basic owner-based access
// Only drive owner can modify
fn update_root_cid(origin, drive_id, new_root_cid) {
let caller = ensure_signed(origin)?;
let drive = Drives::<T>::get(drive_id)?;
ensure!(drive.owner == caller, Error::NotDriveOwner);
// ... proceed with update
}Future Enhancements: See Encryption & Access Control
Encryption is NOT implemented by default. Data is stored in plaintext.
The system provides infrastructure for future encryption:
pub struct FileManifest {
// ... other fields
/// Encryption parameters (optional, for W3ACL)
pub encryption_params: BoundedVec<u8, MaxEncryptionParamsLength>, // 512 bytes max
}ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client-Side Encryption (Planned) β
β β
β Upload: β
β 1. Generate random encryption key (AES-256-GCM) β
β 2. Encrypt file chunks with key β
β 3. Encrypt key with owner's public key β
β 4. Store encrypted_key in FileManifest.encryption_params β
β 5. Upload encrypted chunks β
β β
β Download: β
β 1. Fetch FileManifest β
β 2. Decrypt key with owner's private key β
β 3. Fetch and decrypt chunks β
β β
β Sharing: β
β 1. Decrypt key with owner's private key β
β 2. Re-encrypt key with recipient's public key β
β 3. Create access grant (UCAN or W3ACL) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Status | Description |
|---|---|---|
| Owner-only access | Implemented | Drive owner can read/write |
| Client-side encryption | Planned | AES-256-GCM per file |
| UCAN delegation | Planned | Capability-based access tokens |
| W3ACL integration | Planned | Decentralized access control lists |
| Shared drives | Planned | Multi-user drive access |
For Sensitive Data (Current Workaround):
// Encrypt before upload
let key = generate_aes_key();
let encrypted_data = aes_gcm_encrypt(&file_data, &key);
let nonce = get_nonce_from_encryption();
// Store key securely (e.g., in your app's keystore)
fs_client.upload_file(drive_id, "/secret.enc", &encrypted_data, bucket_id).await?;
// Decrypt after download
let encrypted = fs_client.download_file(drive_id, "/secret.enc").await?;
let plaintext = aes_gcm_decrypt(&encrypted, &key, &nonce);The File System Client uses subxt for trustless blockchain interaction:
pub struct SubstrateClient {
api: OnlineClient<SubstrateConfig>,
signer: Option<Keypair>,
}
impl SubstrateClient {
pub async fn connect(endpoint: &str) -> Result<Self> {
// Connect to parachain WebSocket
let api = OnlineClient::from_url(endpoint).await?;
Ok(Self { api, signer: None })
}
}ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Drive Creation Transaction Flow β
β β
β 1. Client builds extrinsic: β
β DriveRegistry::create_drive(name, capacity, period, payment) β
β β
β 2. Client signs with SR25519 keypair β
β β
β 3. Submit to parachain: β
β POST /transaction β
β β
β 4. Transaction included in block β
β β
β 5. Client watches for finalization: β
β - Poll transaction status β
β - Wait for finality (relay chain confirmation) β
β β
β 6. Extract drive_id from DriveCreated event β
β β
β 7. Query drive state: β
β DriveRegistry::Drives(drive_id) -> DriveInfo β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// Query drive info
async fn query_drive_root_cid(&self, drive_id: DriveId) -> Result<Cid> {
// Build storage key: twox128("DriveRegistry") + twox128("Drives") + blake2_128(drive_id)
let storage_key = build_storage_key("DriveRegistry", "Drives", drive_id);
// Fetch raw bytes from chain state
let bytes = self.api.storage().at_latest().await?.fetch_raw(storage_key).await?;
// Decode DriveInfo and extract root_cid
let drive_info = decode_drive_info(&bytes)?;
Ok(drive_info.root_cid)
}// Find DriveCreated event after transaction
for event in events.iter() {
if event.pallet_name() == "DriveRegistry" {
if let Ok(value) = event.field_values() {
// DriveCreated { drive_id, owner, bucket_id, root_cid }
if let Some(drive_id) = value.at(0).and_then(|v| v.as_u128()) {
return Ok(drive_id as DriveId);
}
}
}
}| Aspect | SCALE | Protobuf |
|---|---|---|
| Determinism | Guaranteed | Field order dependent |
| CID Stability | Always same for same data | Schema changes break CIDs |
| Substrate Integration | Native | Requires conversion |
no_std Support |
Yes | Requires prost with alloc |
| Size | Compact | Slightly larger |
Decision: Use SCALE for all stored data to ensure CID consistency.
Alternatives Considered:
-
Separate Parachains: L0 and L1 on different parachains
- Pro: Independent scaling
- Con: XCM complexity, latency, higher costs
-
L1 on Relay Chain: Drive registry on relay chain
- Pro: Higher security
- Con: Limited functionality, high costs
-
Same Parachain (Chosen):
- Pro: Simple cross-pallet calls, shared state, low latency
- Con: Coupled scaling
Rationale: Simplicity wins. File system operations frequently need bucket/agreement data. Cross-pallet calls are atomic and free.
Alternatives:
- SHA-256: Slower, no substrate optimization
- Keccak-256: Ethereum-compatible but not Substrate-native
- BLAKE3: Newer, not yet in Substrate
Decision: blake2-256 is Substrate-native, fast, and battle-tested.
Benefits:
- Integrity: CID = fingerprint of content
- Deduplication: Same content stored once
- Immutability: CIDs never change
- Verifiability: Anyone can verify data integrity
- Caching: Safe to cache forever
Trade-off: Updates create new CIDs, requiring DAG updates.
Benefits:
- Efficient Updates: Only changed nodes need re-upload
- Versioning: Each root CID is a complete snapshot
- Partial Sync: Download only needed branches
- Proof of Inclusion: Merkle proofs for any entry
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Read Path: download_file("/documents/report.pdf") β
β β
β 1. Check root_cid cache (in-memory) β
β ββ Hit: Skip chain query β
β ββ Miss: Query chain, cache result β
β β
β 2. Traverse path: / β documents β report.pdf β
β ββ Each step: Fetch directory node from provider β
β ββ Optimization: Batch fetches, prefetch siblings β
β β
β 3. Fetch file manifest β
β β
β 4. Fetch chunks in parallel β
β ββ Provider supports range requests β
β ββ Client reassembles locally β
β β
β Typical latency: β
β - Cache hit: ~50ms (single provider round-trip) β
β - Cache miss: ~200ms (chain query + provider) β
β - Large file: Dominated by chunk download time β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Write Path: upload_file("/documents/report.pdf", data) β
β β
β 1. Split file into 256 KiB chunks β
β β
β 2. Upload chunks in parallel β
β ββ Each chunk: Compute CID, upload to provider β
β ββ Provider stores: CID β data β
β β
β 3. Create FileManifest with chunk CIDs β
β ββ Upload manifest, get manifest CID β
β β
β 4. Update parent directory β
β ββ Fetch current directory β
β ββ Add entry: name β manifest CID β
β ββ Upload new directory, get new CID β
β β
β 5. Update ancestors up to root β
β ββ Recursive: Each parent gets new CID β
β β
β 6. Update on-chain root_cid β
β ββ Based on CommitStrategy: β
β - Immediate: Submit transaction now β
β - Batched: Queue, submit on interval β
β - Manual: Store pending, wait for commit_changes() β
β β
β Optimization: Batch multiple writes before chain update β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Important: When reading data from providers, avoid u64::MAX as length parameter:
// BAD: Causes overflow in provider's chunk calculation
let data = storage_client.read(&cid, 0, u64::MAX).await?;
// GOOD: Use reasonable maximum (1 TiB)
const MAX_READ_LENGTH: u64 = 1024 * 1024 * 1024 * 1024;
let data = storage_client.read(&cid, 0, MAX_READ_LENGTH).await?;Reason: Provider calculates end_chunk = (offset + length + chunk_size - 1) / chunk_size. With u64::MAX, this overflows and returns no chunks.
| Document | Description |
|---|---|
| User Guide | Complete guide for end users |
| Example Walkthrough | Step-by-step basic_usage.rs walkthrough |
| Document | Description |
|---|---|
| Admin Guide | System administration and monitoring |
| Document | Description |
|---|---|
| API Reference | Complete API documentation |
| File System Interface | Architecture overview |
| Document | Description |
|---|---|
| Extrinsics Reference | Layer 0 blockchain API |
| Payment Calculator | Calculate storage costs |
| Quick Start | Get running in 5 minutes |
| Document | Description |
|---|---|
| Scalable Web3 Storage Design | System design & rationale |
| Implementation Details | Technical specifications |
let dir = DirectoryNode {
drive_id: 5,
children: vec![
DirectoryEntry {
name: "documents",
entry_type: Directory,
cid: 0x9955e72d...,
size: 0,
mtime: 1707456000,
},
DirectoryEntry {
name: "README.md",
entry_type: File,
cid: 0x0bc42ff7...,
size: 127,
mtime: 1707456020,
},
],
metadata: vec![],
};
// SCALE encoding (184 bytes for this example):
// 05 00 00 00 00 00 00 00 // drive_id: 5
// 0c // children count: 3 (compact)
// 24 // name length: 9 (compact)
// 64 6f 63 75 6d 65 6e 74 73 // "documents"
// 01 // entry_type: Directory
// 99 55 e7 2d ... // cid: 32 bytes
// 00 00 00 00 00 00 00 00 // size: 0
// 05 08 28 96 90 00 00 00 // mtime
// ...let manifest = FileManifest {
drive_id: 5,
mime_type: "application/pdf",
total_size: 1048576, // 1 MiB
chunks: vec![
FileChunk { cid: 0xabc..., sequence: 0 },
FileChunk { cid: 0xdef..., sequence: 1 },
FileChunk { cid: 0x123..., sequence: 2 },
FileChunk { cid: 0x456..., sequence: 3 },
],
encryption_params: vec![], // Empty (no encryption)
};| Term | Definition |
|---|---|
| CID | Content Identifier - blake2-256 hash of data |
| DAG | Directed Acyclic Graph - tree structure of CIDs |
| Drive | User's logical file system (Layer 1 concept) |
| Bucket | Storage container (Layer 0 concept) |
| MMR | Merkle Mountain Range - efficient append-only commitment |
| SCALE | Simple Concatenated Aggregate Little-Endian encoding |
| Checkpoint | On-chain commitment to off-chain data state |
| Root CID | CID of the root directory (stored on-chain) |
Last updated: February 2026