D-LOCKSS

Distributed Lots of Copies Keep Stuff Safe

Build from source: go build -o dlockss ./cmd/dlockss then run ./dlockss (see Building from Source).

1. Summary & Vision

D-LOCKSS is a decentralized storage network for long-term preservation and authenticity of research data.

Core Philosophy: "Networked RAID." Just as RAID protects data across multiple hard drives, D-LOCKSS protects data across a distributed network of peers.
Authenticity: Relies on Content Addressing (CIDs) to guarantee data integrity.
Scope: Focuses purely on replication, redundancy, and availability.

Goals

Speed & Safety: Combine the speed of IPFS Cluster with the safety of LOCKSS.
Automation: Fast enough for millions of files, smart enough to maintain replication levels without human intervention.

2. Quick Start

Prerequisites

OS: Linux, macOS, WSL, or Windows 10+.
IPFS: A running IPFS daemon is required.
- Install IPFS CLI
- Run: ipfs daemon

Usage

Start the Node: Run the binary (see Building from Source to build it):
```
./dlockss
```
(Windows: dlockss.exe)
Add Files: Copy any file (e.g., PDF) into the data directory (default ./data or DLOCKSS_DATA_DIR). The node will automatically detect, ingest, pin, and replicate the file.

Configuration

Configure via environment variables:

# Data Directory
export DLOCKSS_DATA_DIR="$HOME/my-data"

# Node Identity
export DLOCKSS_NODE_NAME="my-node"            # Human-readable name (shown in monitor)
export DLOCKSS_IDENTITY_PATH="/data/dlockss.key"  # Persistent identity key location (fallback if IPFS node config cannot be read)
export DLOCKSS_IPFS_CONFIG="/path/to/ipfs/config" # Kubo config JSON (derives identity from IPFS repo)

# Replication Targets
export DLOCKSS_MIN_REPLICATION=5
export DLOCKSS_MAX_REPLICATION=10

# Network
export DLOCKSS_IPFS_NODE="/ip4/127.0.0.1/tcp/5001"

# DHT tuning
export DLOCKSS_MAX_CONCURRENT_DHT_PROVIDES=8 # Limit concurrent DHT provide operations

# Logging
export DLOCKSS_VERBOSE_LOGGING=true # Enable detailed metrics and status logs

Node Naming

Nodes can have a human-readable name displayed in the monitor dashboard. The name is resolved in order:

DLOCKSS_NODE_NAME environment variable (highest priority)
Persisted name file (node_name alongside the data directory)
Interactive prompt on first startup (when running outside Docker/testnet)

Testnet nodes are automatically named testnet_1, testnet_2, etc.

Identity Persistence

The node's libp2p identity (private key) determines its Peer ID. The identity is resolved in order:

IPFS config (DLOCKSS_IPFS_CONFIG set): Reads Identity.PrivKey from the Kubo config JSON so D-LOCKSS and IPFS share one Peer ID. For Docker, mount the single config file read-only.
Persistent key file (DLOCKSS_IDENTITY_PATH or default {data_dir_parent}/dlockss.key): Used when connecting to a remote/Docker Kubo node where the repo is not accessible.
Auto-generated: If no key exists, a new Ed25519 key is generated and saved to the identity path.

For Docker deployments: either mount the Kubo config file and set DLOCKSS_IPFS_CONFIG, or mount a persistent volume and set DLOCKSS_DATA_DIR to a subdirectory on it. The identity key, node name, and cluster state are stored alongside the data directory and will survive container rebuilds.

Path safety: The node refuses to start if the identity key, node name, or cluster store would be placed inside the ingest directory (DLOCKSS_DATA_DIR), since the file watcher would try to ingest them. Always set DLOCKSS_DATA_DIR to a dedicated subdirectory (e.g. ./data, not .).

Docker Compose Example

services:
  dlockss-node:
    image: ghcr.io/gipplab/dlockss-single-node:latest
    restart: unless-stopped
    environment:
      DLOCKSS_IPFS_NODE: "/dns4/ipfs/tcp/5001"   # "ipfs" resolves to the Kubo service below
      DLOCKSS_DATA_DIR: "/data/ingest"           # location that D-LOCKSS monitors for ingesting files
      DLOCKSS_IPFS_CONFIG: "/ipfs-repo/config"   # derive identity from IPFS node (shared peer ID)
      # DLOCKSS_NODE_NAME: my-node               # human-readable name shown in the monitor;
      #                                          # if empty the peer ID is displayed instead
    volumes:
      - ./dlockss-files:/data                    # persistent D-LOCKSS data (identity, cluster state, ingested files)
      - ipfs-data:/ipfs-repo:ro                  # read-only access to Kubo config for identity
    depends_on:
      - ipfs
    labels:
      - com.centurylinklabs.watchtower.enable=true

  ipfs:
    image: ipfs/kubo
    restart: unless-stopped
    # Uncomment if you can forward port 4001 (TCP+UDP) for better peering:
    # ports:
    #   - 4001:4001/tcp
    #   - 4001:4001/udp
    volumes:
      - ipfs-staging:/export
      - ipfs-data:/data/ipfs
    environment:
      - IPFS_PROFILE=server
    healthcheck:
      test: ["CMD-SHELL", "ipfs id || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5

  ### Optional services

  # Watchtower keeps the D-LOCKSS image up to date automatically.
  # Recommended until a stable release is published.
  watchtower:
    image: containrrr/watchtower
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    command: --label-enable --include-stopped --revive-stopped

volumes:
  ipfs-staging:  # IPFS staging area on /export
  ipfs-data:     # IPFS repo on /data/ipfs (shared read-only with D-LOCKSS for identity)

See docs/DLOCKSS_PROTOCOL.md for protocol details.

3. Architecture

D-LOCKSS acts as a self-healing, sharded storage cluster using the IPFS/Libp2p stack.

Key Components

Shard Manager: Dynamically splits responsibilities based on peer count to maintain scalability. Delegates lifecycle decisions (split/merge/discovery) to a lifecycleManager and replication to a replicationManager.
Cluster Manager: Manages embedded IPFS Cluster instances (one per shard) using CRDTs for state consensus; nodes in a shard sync and pin content assigned to that shard.
File Watcher: Monitors the data directory to automatically ingest content (via handleWatcherEvent / handleNewDirectory).
Storage Monitor: Protects nodes from disk exhaustion by rejecting custodial requests when full.
BadBits Manager: Enforces content blocking (e.g., DMCA) based on configured country codes.

"Networked RAID" Logic

Striping -> Sharding: Responsibility for files is determined by a stable hash of the PayloadCID (TargetShardForPayload); each file lives in exactly one cluster (shard).
Redundancy -> Cluster Consensus: Each shard runs an embedded IPFS Cluster CRDT. When a file is ingested, it is "pinned" to the shard's cluster state. All peers in that shard sync this state and automatically pin the content locally.
Write Cache -> Custodial Mode: Nodes temporarily hold files they don't own until they can hand them off to the responsible shard.

Documentation:

Protocol specification
Replication performance
Architecture diagrams (PlantUML) in docs/

4. Development

Building from Source

Requires Go 1.21+.

git clone https://github.com/gipplab/D-LOCKSS
cd D-LOCKSS
go build -ldflags="-s -w" -o dlockss ./cmd/dlockss
./dlockss

Optional monitor (dashboard):

go build -o dlockss-monitor ./cmd/dlockss-monitor
./dlockss-monitor                           # uses default topic (creative-commons)
./dlockss-monitor -topic my-archive         # monitor a specific topic
./dlockss-monitor -topic my-archive -prefix dlockss-v0.0.4  # custom topic + prefix

Open http://localhost:8080. The -topic and -prefix flags override the DLOCKSS_TOPIC_NAME and DLOCKSS_PUBSUB_TOPIC_PREFIX environment variables respectively. The topic is fixed at startup (the dashboard displays it read-only).

The monitor displays each node's name (if configured via DLOCKSS_NODE_NAME), falling back to the Peer ID. Names propagate via HEARTBEAT/JOIN messages and appear in the node table, charts, and shard modals. Client-side aliases (EDIT button) override server-side names. Each node has one peer ID: when DLOCKSS_IPFS_CONFIG is set (e.g. in testnet), D-LOCKSS uses the IPFS repo identity so the same ID appears in the monitor and in node_x.ipfs.log.

The monitor bootstrap-subscribes to all shards up to depth 6 (127 shards) so it can see nodes even when started late. Set DLOCKSS_MONITOR_BOOTSTRAP_SHARD_DEPTH (0–12) to tune.

Alternatively use: https://dlockss-monitor.wmcloud.org.

Testnet

From testnet/: ./run_testnet.sh starts multiple D-LOCKSS nodes and IPFS daemons. Each node is automatically named testnet_1, testnet_2, etc. (visible in the monitor) and has one peer ID (D-LOCKSS loads the identity from the node's IPFS repo via DLOCKSS_IPFS_CONFIG). Press Enter in the script to shut down.

Testing

go test ./... -v

Project Status

Current Phase: Production — structural refactoring complete (see Code Elegance Plan). Config uses nested sub-structs (Sharding, Replication, Files, Security, Orphan). ShardManager delegates to replicationManager and lifecycleManager.

5. Security

Signed Messages: All protocol messages are signed by the sender's Libp2p key.
Manifest Verification: ResearchObjects include signatures from the ingester.
Trust Modes: Supports open (default) or allowlist trust models.

6. License

Dual licensed under the MIT License or Apache License 2.0, at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
pkg		pkg
shareable-documents/mardi-pdfs-fetcher		shareable-documents/mardi-pdfs-fetcher
testnet		testnet
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile_monitor		Dockerfile_monitor
LICENSE		LICENSE
LICENSE-Apache-2.0		LICENSE-Apache-2.0
QUICK_START.md		QUICK_START.md
README.md		README.md
badBits.csv.example		badBits.csv.example
go.mod		go.mod
go.sum		go.sum
shard.test		shard.test
trusted_peers.json.example		trusted_peers.json.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D-LOCKSS

1. Summary & Vision

Goals

2. Quick Start

Prerequisites

Usage

Configuration

Node Naming

Identity Persistence

Docker Compose Example

3. Architecture

Key Components

"Networked RAID" Logic

4. Development

Building from Source

Testnet

Testing

Project Status

5. Security

6. License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

D-LOCKSS

1. Summary & Vision

Goals

2. Quick Start

Prerequisites

Usage

Configuration

Node Naming

Identity Persistence

Docker Compose Example

3. Architecture

Key Components

"Networked RAID" Logic

4. Development

Building from Source

Testnet

Testing

Project Status

5. Security

6. License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages