Distributed Key-Value Store (Dynamo Implementation)

Markdown

Distributed Key-Value Store (Dynamo Implementation)

A high-availability, partition-tolerant key-value store inspired by Amazon's Dynamo paper. This system uses Consistent Hashing, Replication, and Vector Clocks to manage distributed state across a cluster of nodes.

Architecture

The system is built in Python using gRPC for node-to-node communication. It implements the following core distributed systems concepts:

Partitioning: Data is distributed across nodes using Consistent Hashing with Virtual Nodes to ensure uniform load distribution.
High Availability: Implements Tunable Quorums (N, R, W). The system remains available for reads/writes even if nodes fail, using a "Sloppy Quorum" approach.
Conflict Resolution: Uses Vector Clocks to track causal history and detect concurrent updates (e.g., the "shopping cart" problem), allowing for eventual consistency.
Gossip Protocol: (Planned) Nodes periodically exchange state to detect failures and maintain membership lists.

How to Run

The easiest way to run the cluster is via Docker Compose, which spins up 3 storage nodes and sets up the network.

1. Start the Cluster

docker-compose up --build

You will see logs from node-1, node-2, and node-3 starting up on ports 50051-50053.

Run the Client (Coordinator) Open a new terminal. This client acts as the application layer, interacting with the ring.

python vector_clock_client.py

What You Will See (Demo)

The client performs a "Shopping Cart" simulation to demonstrate Vector Clocks:

Write 1: User adds "Milk" -> Saved to Node A (Version 1).
Write 2: User adds "Eggs" using Version 1 context -> Saved to Node B (Version 2).
Result: The server returns an updated Vector Clock (e.g., {"NodeA": 1, "NodeB": 1}), proving causal tracking.

Roadmap / Improvements

Gossip Protocol: Replace static node configuration with SWIM protocol for decentralized member discovery.
Hinted Handoff: Implement temporary storage for writes destined for downed nodes.
Persistence: Swap in-memory dictionary for RocksDB or LevelDB for disk durability.

Technical Highlights (Why I built this)

Availability vs. Consistency: Designed to prioritize Availability (AP in CAP theorem). Writes always succeed if W nodes are reachable.
Handling Skew: Implemented Virtual Nodes to prevent hot spots where one server holds too much data.
Fault Tolerance: If a node in the preference list is down, the system automatically replicates to the next healthy node in the ring.

References

Dynamo: Amazon’s Highly Available Key-value Store (DeCandia et al., 2007)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
client.py		client.py
distributed_client.py		distributed_client.py
docker-compose.yml		docker-compose.yml
dynamo.proto		dynamo.proto
dynamo_pb2.py		dynamo_pb2.py
dynamo_pb2_grpc.py		dynamo_pb2_grpc.py
hash_ring.py		hash_ring.py
node.py		node.py
requirements.txt		requirements.txt
vector_clock_client.py		vector_clock_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Key-Value Store (Dynamo Implementation)

Architecture

How to Run

1. Start the Cluster

What You Will See (Demo)

Roadmap / Improvements

Technical Highlights (Why I built this)

References

About

Uh oh!

Releases

Packages

Languages

divyam2207/amazon-dynamo-clone

Folders and files

Latest commit

History

Repository files navigation

Distributed Key-Value Store (Dynamo Implementation)

Architecture

How to Run

1. Start the Cluster

What You Will See (Demo)

Roadmap / Improvements

Technical Highlights (Why I built this)

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages