Skip to content

Databus 2.0 relay chaining

groelofs edited this page Jan 16, 2013 · 1 revision

Databus 2.0 relay chaining

Relay Chaining

Requirements

  • Ability for a relay to consume from another relay.
  • “Look and feel” should be that of a relay. This includes logging, monitoring, and operability.
  • It’s possible for some sources to be “chained” and others “master” in a multi-tenant relay deployment.

Design

Chaining

  • Use a consumer that writes to any given event buffer.
    • (pro) Buffer management and integration with a multi-tenant relay is easy–including getting the “look and feel” of a relay.
    • (con) Less efficient than is ideal, as it involves a de-serialization of DbusEvent one at a time. However, the payload (Avro) is passed as is.
    • (pro) Extensible for custom filtering on source–i.e., filtering on a non-key field is possible.
  • The config parameter that decides if a relay is chained is the uri field of physicalSource. If it is a URN like abc.foo.com:9993, then the assumption is that this relay is chained to abc.foo.com.
  • All per-physical-source-specific parameters are honored. The chained relay always tries to read whatever is in the source relay.
    • The chained relay reads the entire buffer if and only if no checkpoint is present. If present, it honors the checkpoint. In the startup case, there is no functionality at present to mimic restartScnOffset from latest SCN.
  • Bootstrapping is always off.
  • On startup, the first SCN (lower watermark) has to be explicitly set, either by the consumer (start without SCN) or by the RelayEventProducer (start with SCN).
  • Retries are always infinite, with backoff fixed to top off at 1 minute.
    • SCN is saved in the relay’s MaxSCN file. The consumer always writes the SCN in this location. The RelayEventProducer reads the SCN from Relay’s MaxSCN file and directly sets the client’s checkpoint.
    • Only whole SCNs are written, i.e., only on endDataSequence() calls, not on every onCheckpoint() call (no partial windows).
  • consumerParallelism set to 1.

Monitoring

  • All metrics are honored.
    • It’s unclear how to translate lastTimeDBAccessed, so it is not yet populated. (One possibility: set it equal to the last time the relay was pinged?)

Logging

  • The logs look like normal relay logs.

Leader Election in a Relay Cluster

This part is not yet implemented–currently the leader is statically configured.

Requirements

  • High-level Functionality
    • If N relays are subscribing to a physical DB (“master source”), only k of those should listen to the master source; the other n-k should listen to the k relays. The k relays are “leaders”; the n-k relays are “followers” and are said to be chained.
    • On failure of one of the k leader relays, one of the n-k chained relays will start listening to the master source (i.e., it will convert itself to a leader).
    • The leader/follower concept should work at the level of a physical source, which means that in a multi-tenant relay it should work at the level of the buffer.
  • Functionality
    • Each relay should have the capability to read from either the source DB or another relay.
    • Each relay should be able to dynamically switch from one mode to another.
      • Retain data from master/slave.
      • No disruption of queries in flight.
  • Not in scope
    • Obtain config of buffers via cluster manager
  • Operability
    • The relays all subscribe to the “master source.” On startup they join the master source’s cluster.
      • No additional “Cluster Manager” tool configuration/management is required.
    • At any given time, tools are required to be able to figure out the k leaders and n-k followers.
    • Failure of any of the k leaders should be handled without human intervention.
    • Configuration
      • The master source is specified in config, even statically.
      • The host addresses are discovered dynamically.
  • Monitoring
    • The timeSinceDBLastAccess metric will be overloaded. It could mean read from relay or DB. A separate metric to indicate DB reads would be useful.
  • Scalability
    • Cluster size on the order of tens of nodes.
    • Availability: how quickly is a new leader picked up? Tolerances of up to 30 seconds (worst case where no new updates are being read) are permissible. This should be lower than the alerts set on sensors that indicate data staleness.

Design and Implementation

  • Leader/follower model implemented in Helix (cluster manager).
    • Cluster is created if not already present; addition/removal of cluster should be possible programmatically.
    • Each node joins the cluster (with physical source name as ID) and waits for input from Helix. Implement a Helix client state model (in relay).
  • Implement state-transition actions.
    • LEADER -> FOLLOWER
      • Shutdown/Pause thread reading from database. Retain buffer, pass it to newly created ‘RelayChained’ thread, which resumes from the SCN saved locally on disk. Ensure the buffer is in a consistent state (no partial windows).
      • Obtain leader information from the cluster; set the source-database URI to leader; stop leader thread; start chained thread with the same buffer.
    • FOLLOWER -> LEADER
      • Shutdown/Pause thread reading from leader; retain buffer; activate leader thread with DB URI (from config). Ensure the buffer is in a consistent state (no partial windows); also ensure that the internal buffers are in a consistent state. Easier to shut down?
    • Startup: start up leader and follower threads in paused mode?
    • Additional: Implement pause functionality? How to initiate it for the leader (without getting a transition)?
  • Implement metric and API to determine leader for each buffer.
  • Implement fail-safe logic if relay is disconnected from ZooKeeper. (What should happen? Remember prior state?)
  • Implement config to work without requiring ZooKeeper.
  • Implement stand-alone tool/script to obtain state of cluster.

Dependencies

External

  • Cluster manager: ability to create/delete cluster programmatically.

Internal

  • No bootstrap dependencies (not applicable).
  • No client library deps (no planned protocol changes; client will still see the cluster as a VIP).
  • No client changes (clients continue to see the set of relays as a VIP).

Testing

  • Functional Testing
    • Cluster of 1.
    • Cluster size >= 2: Leader/Follower, Follower/Leader transition; correctness (data continuity for the client, with random termination of relays).
    • Loss of ZooKeeper connection: test fail-safe mode.
  • Operability Testing
    • Rolling restart testing: how many transitions; effect on client.
    • Metric continuity. Metrics should work seamlessly, regardless of whether a relay is a leader or follower.
  • Performance Testing
    • Time of cluster without a leader (should not exceed 5 seconds?).