Skip to content

Gossip-based discovery #6

@gordonbrander

Description

@gordonbrander

Possible enhancement: discovery via gossip.

Goals:

  • (Very) eventual consistency
    • Between groups of trusted peers (e.g. multiple scientific organizations)
    • Between any peers
  • Censorship-resistance via redundancy
    • Easy creation of mirrors
  • Low-cost to run (i.e. should not balloon the cost or resources needed to run vs a static file server)
    • Implies sub-linear scaling properties, or else a limit on cluster size.
  • Runs on commodity hardware
  • Try to keep the protocol very simple to implement
    • Vanilla HTTP REST
    • Avoid requiring fancy HAMT/bloom filter/merkel tree stuff if we can!

Scope:

  • Discovery
    • Peers
    • Liveness status
    • CID locations (URL for CID)
  • Simple moderation
    • Peer allow/deny lists (by origin)
    • CID block list
  • Peer bootstrapping API
    • Request all / as much data as possible from a peer in the form of CIDs
    • Kick off bootstrapping background task

Research notes below.


Gossip protocol

Gossip protocols are a decentralized peer-to-peer communication technique to transmit messages in an enormous distributed system.

Every node periodically sends out a message to a subset of other random nodes. The entire system will receive the particular message eventually with a high probability.

Gossip protocols are typically used to maintain the node membership list, achieve consensus, and fault detection in a distributed system. In addition, additional information such as application-level data can be piggybacked on gossip messages.

Node failure can be overcome by the retransmission of a message by another node.

First-in-first-out (FIFO) broadcast, causality broadcast, and total order broadcast can be implemented with gossip protocol

How many peers do you have to notify?

The gossip protocol parameters such as cycle and fanout can be tuned to improve the probabilistic guarantees of the gossip protocol

The number of nodes that will receive the message from a particular node is known as the fanout. The count of gossip rounds required to spread a message across the entire cluster is known as the cycle [8], [5].

cycles necessary to spread a message across the cluster = O(log n) to the base of fanout, where n = total number of nodes.

E.g.

// Formula for conversion of log bases
// See https://stackoverflow.com/questions/13831150/logarithm-algorithm

log(n) / log(b)

// Cycles
log(1000) / log(4) ~= 4.9
log(50k) / log(4) ~= 7.8
log(50k) / log(6) ~= 6

The propagation of a message in the gossip protocol should automatically age out to reduce the unnecessary load.

Rumor-monger protocol adjusts cycles dynamically by counting up a "stifling" counter every time a peer responds with "saw this already" (see below). As soon as counter reaches 2 or 3, it stops cycling.

Terms

  • Fanout: nodes you randomly notify

Rumor-monger protocol

A flavor of gossip introduced in the Xerox PARC paper “Epidemic Algorithms for Replicated Data‐Base Maintenance” by Demers et al. (1987).

A node that receives an update becomes infectious and repeatedly executes the same simple step:

  • Choose fanout k random neighbors from the membership list (usually with replacement).
  • Send the update to each.
  • When the node contacts a neighbor that is already infected it increments a local stifling counter s.
  • As soon as the counter reaches a small constant (typically s = 2 or 3) the node considers the rumor “old news” and stops sending it.

Fan‑out: Large k reduces the number of rounds but increases traffic quadratically. In practice systems such as Cassandra, Redis Cluster and the SWIM failure detector pick k = 3–4; this covers millions of nodes well below a second on commodity networks.

Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions