Distributed Lots of Copies Keep Stuff Safe
Build from source:
go build -o dlockss ./cmd/dlockssthen run./dlockss(see Building from Source).
D-LOCKSS is a decentralized storage network for long-term preservation and authenticity of research data.
- Core Philosophy: "Networked RAID." Just as RAID protects data across multiple hard drives, D-LOCKSS protects data across a distributed network of peers.
- Authenticity: Relies on Content Addressing (CIDs) to guarantee data integrity.
- Scope: Focuses purely on replication, redundancy, and availability.
- Speed & Safety: Combine the speed of IPFS Cluster with the safety of LOCKSS.
- Automation: Fast enough for millions of files, smart enough to maintain replication levels without human intervention.
- OS: Linux, macOS, WSL, or Windows 10+.
- IPFS: A running IPFS daemon is required.
- Install IPFS CLI
- Run:
ipfs daemon
-
Start the Node: Run the binary (see Building from Source to build it):
./dlockss
(Windows:
dlockss.exe) -
Add Files: Copy any file (e.g., PDF) into the data directory (default
./dataorDLOCKSS_DATA_DIR). The node will automatically detect, ingest, pin, and replicate the file.
Configure via environment variables:
# Data Directory
export DLOCKSS_DATA_DIR="$HOME/my-data"
# Node Identity
export DLOCKSS_NODE_NAME="my-node" # Human-readable name (shown in monitor)
export DLOCKSS_IDENTITY_PATH="/data/dlockss.key" # Persistent identity key location (fallback if IPFS node config cannot be read)
export DLOCKSS_IPFS_CONFIG="/path/to/ipfs/config" # Kubo config JSON (derives identity from IPFS repo)
# Replication Targets
export DLOCKSS_MIN_REPLICATION=5
export DLOCKSS_MAX_REPLICATION=10
# Network
export DLOCKSS_IPFS_NODE="/ip4/127.0.0.1/tcp/5001"
# DHT tuning
export DLOCKSS_MAX_CONCURRENT_DHT_PROVIDES=8 # Limit concurrent DHT provide operations
# Logging
export DLOCKSS_VERBOSE_LOGGING=true # Enable detailed metrics and status logsNodes can have a human-readable name displayed in the monitor dashboard. The name is resolved in order:
DLOCKSS_NODE_NAMEenvironment variable (highest priority)- Persisted name file (
node_namealongside the data directory) - Interactive prompt on first startup (when running outside Docker/testnet)
Testnet nodes are automatically named testnet_1, testnet_2, etc.
The node's libp2p identity (private key) determines its Peer ID. The identity is resolved in order:
- IPFS config (
DLOCKSS_IPFS_CONFIGset): ReadsIdentity.PrivKeyfrom the Kubo config JSON so D-LOCKSS and IPFS share one Peer ID. For Docker, mount the single config file read-only. - Persistent key file (
DLOCKSS_IDENTITY_PATHor default{data_dir_parent}/dlockss.key): Used when connecting to a remote/Docker Kubo node where the repo is not accessible. - Auto-generated: If no key exists, a new Ed25519 key is generated and saved to the identity path.
For Docker deployments: either mount the Kubo config file and set DLOCKSS_IPFS_CONFIG, or mount a persistent volume and set DLOCKSS_DATA_DIR to a subdirectory on it. The identity key, node name, and cluster state are stored alongside the data directory and will survive container rebuilds.
Path safety: The node refuses to start if the identity key, node name, or cluster store would be placed inside the ingest directory (
DLOCKSS_DATA_DIR), since the file watcher would try to ingest them. Always setDLOCKSS_DATA_DIRto a dedicated subdirectory (e.g../data, not.).
services:
dlockss-node:
image: ghcr.io/gipplab/dlockss-single-node:latest
restart: unless-stopped
environment:
DLOCKSS_IPFS_NODE: "/dns4/ipfs/tcp/5001" # "ipfs" resolves to the Kubo service below
DLOCKSS_DATA_DIR: "/data/ingest" # location that D-LOCKSS monitors for ingesting files
DLOCKSS_IPFS_CONFIG: "/ipfs-repo/config" # derive identity from IPFS node (shared peer ID)
# DLOCKSS_NODE_NAME: my-node # human-readable name shown in the monitor;
# # if empty the peer ID is displayed instead
volumes:
- ./dlockss-files:/data # persistent D-LOCKSS data (identity, cluster state, ingested files)
- ipfs-data:/ipfs-repo:ro # read-only access to Kubo config for identity
depends_on:
- ipfs
labels:
- com.centurylinklabs.watchtower.enable=true
ipfs:
image: ipfs/kubo
restart: unless-stopped
# Uncomment if you can forward port 4001 (TCP+UDP) for better peering:
# ports:
# - 4001:4001/tcp
# - 4001:4001/udp
volumes:
- ipfs-staging:/export
- ipfs-data:/data/ipfs
environment:
- IPFS_PROFILE=server
healthcheck:
test: ["CMD-SHELL", "ipfs id || exit 1"]
interval: 10s
timeout: 5s
retries: 5
### Optional services
# Watchtower keeps the D-LOCKSS image up to date automatically.
# Recommended until a stable release is published.
watchtower:
image: containrrr/watchtower
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command: --label-enable --include-stopped --revive-stopped
volumes:
ipfs-staging: # IPFS staging area on /export
ipfs-data: # IPFS repo on /data/ipfs (shared read-only with D-LOCKSS for identity)See docs/DLOCKSS_PROTOCOL.md for protocol details.
D-LOCKSS acts as a self-healing, sharded storage cluster using the IPFS/Libp2p stack.
- Shard Manager: Dynamically splits responsibilities based on peer count to maintain scalability. Delegates lifecycle decisions (split/merge/discovery) to a
lifecycleManagerand replication to areplicationManager. - Cluster Manager: Manages embedded IPFS Cluster instances (one per shard) using CRDTs for state consensus; nodes in a shard sync and pin content assigned to that shard.
- File Watcher: Monitors the data directory to automatically ingest content (via
handleWatcherEvent/handleNewDirectory). - Storage Monitor: Protects nodes from disk exhaustion by rejecting custodial requests when full.
- BadBits Manager: Enforces content blocking (e.g., DMCA) based on configured country codes.
- Striping -> Sharding: Responsibility for files is determined by a stable hash of the PayloadCID (TargetShardForPayload); each file lives in exactly one cluster (shard).
- Redundancy -> Cluster Consensus: Each shard runs an embedded IPFS Cluster CRDT. When a file is ingested, it is "pinned" to the shard's cluster state. All peers in that shard sync this state and automatically pin the content locally.
- Write Cache -> Custodial Mode: Nodes temporarily hold files they don't own until they can hand them off to the responsible shard.
Documentation:
- Protocol specification
- Replication performance
- Architecture diagrams (PlantUML) in
docs/
Requires Go 1.21+.
git clone https://github.com/gipplab/D-LOCKSS
cd D-LOCKSS
go build -ldflags="-s -w" -o dlockss ./cmd/dlockss
./dlockssOptional monitor (dashboard):
go build -o dlockss-monitor ./cmd/dlockss-monitor
./dlockss-monitor # uses default topic (creative-commons)
./dlockss-monitor -topic my-archive # monitor a specific topic
./dlockss-monitor -topic my-archive -prefix dlockss-v0.0.4 # custom topic + prefixOpen http://localhost:8080. The -topic and -prefix flags override the DLOCKSS_TOPIC_NAME and DLOCKSS_PUBSUB_TOPIC_PREFIX environment variables respectively. The topic is fixed at startup (the dashboard displays it read-only).
The monitor displays each node's name (if configured via DLOCKSS_NODE_NAME), falling back to the Peer ID. Names propagate via HEARTBEAT/JOIN messages and appear in the node table, charts, and shard modals. Client-side aliases (EDIT button) override server-side names. Each node has one peer ID: when DLOCKSS_IPFS_CONFIG is set (e.g. in testnet), D-LOCKSS uses the IPFS repo identity so the same ID appears in the monitor and in node_x.ipfs.log.
The monitor bootstrap-subscribes to all shards up to depth 6 (127 shards) so it can see nodes even when started late. Set DLOCKSS_MONITOR_BOOTSTRAP_SHARD_DEPTH (0–12) to tune.
Alternatively use: https://dlockss-monitor.wmcloud.org.
From testnet/: ./run_testnet.sh starts multiple D-LOCKSS nodes and IPFS daemons. Each node is automatically named testnet_1, testnet_2, etc. (visible in the monitor) and has one peer ID (D-LOCKSS loads the identity from the node's IPFS repo via DLOCKSS_IPFS_CONFIG). Press Enter in the script to shut down.
go test ./... -v- Current Phase: Production — structural refactoring complete (see Code Elegance Plan). Config uses nested sub-structs (
Sharding,Replication,Files,Security,Orphan). ShardManager delegates toreplicationManagerandlifecycleManager.
- Signed Messages: All protocol messages are signed by the sender's Libp2p key.
- Manifest Verification: ResearchObjects include signatures from the ingester.
- Trust Modes: Supports
open(default) orallowlisttrust models.
Dual licensed under the MIT License or Apache License 2.0, at your option.