Skip to content

Databus 2.0 relay design

groelofs edited this page Jan 16, 2013 · 5 revisions

Databus 2.0 Relay Design

Introduction

Databus Relays are responsible for

  • Reading changed rows from the Databus sources in the source database (or other relays chained) and serialize them as Databus data change events in an in-memory buffer
  • Listening for requests from Databus Clients (Bootstrap Producers, other chained relays or event consumers) and stream new Databus data change events to the clients.

Architecture

Here is a high-level overview of the relay.

The relay has one or more circular event buffers that store the databus events in order of increasing system change numbers (SCNs). The memory for these buffers may be either direct or memory-mapped backed by a file system. Each buffer also has a corresponding in-memory sparse index called the SCN index, and a MaxSCN reader/writer. The MaxSCN reader/writer periodically persists the value of the highest SCN seen in the events pulled into the relay. The relays fields requests from databus clients via a request processor, which listens to a Netty channel. The relay exposes a RESTful interface both to pull events and to manage the relay itself.

If the contents of the event buffer are configured to be memory-mapped, then the relay also saves the SCN index and some metadata upon shutdown, so that the events are preserved upon restart.

The circular event buffers are discussed in the page on Event Buffer Design.

Database Event Producer

The Database Event Producer periodically polls the source primary Oracle database for changes in Databus sources. If it detects such changes, it reads all changed rows in those sources and converts them to Avro records.

The conversion from JDBC RowSets to AvroRecords is done automatically based on Avro Schemas stored in the Schema Registry. The schemas are automatically generated from the Oracle schema for the Databus sources.

The Avro records are serialized into Databus events (see Javadocs generated with the class DbusEvent) which contain Databus-specific meta data and the data payload with the binary serialization of the Avro records.

The AbstractEventProducer class provides a framework for developing event producers. OracleEventProducer generates DbusEvents while reading from an Oracle database that has been modified to contain the txlog table and triggers to update the txlog table.

MaxSCN Reader/Writer

The MaxSCN Reader/Writer is used to keep track of the progress of the Database Event Producer (DBEP). The reader is used during the start of the DBEP. If no starting SCN has been specified, one will be read using the reader so the DBEP can continue where it left last.

After each batch of updates that is read from the DBEP and stored in the relay event buffer, the DBEP stores using the MaxSCN writer the last SCN that has been processed.

Currently, the MaxSCN reader/writer stores the SCN locally in a file. An example of the max scn file contents is:

5984592768094:Thu Dec 13 00:54:07 UTC 2012

Other implementations can use a RDBMS or Zookeeper.

REST Interface

See Databus V2 Protocol.

JMX

JMX support comes standard with the Jetty container. We expose a variety of MBeans for monitoring the relay. Please refer the the Javadocs of the following classes:

  • ContainerStatsMBean
  • DbusEventsTotalStatsMBean
  • DbusEventsStatisticsCollectorMBean

Schema Registry

The schema registry is a package that has the schemas for all database tables that are known to Databus.

Creating Avro records for the table schema

After performing gradle assemble, unpack the generated tarball from build/databus2-cmdline-tools-pkg/distributions/ and cd into the created directory. Then:

$ ./bin/dbus2-avro-schema-gen.sh -namespace com.linkedin.events.example.person -recordName Person \
    -viewName "sy\$person" -avroOutDir /path/to/work/trunk/schemas_registry -avroOutVersion 1 \
    -javaOutDir /path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java \
    -userName person -password person
Processed command line arguments:
recordName=Person
avroOutVersion=1
viewName=sy$person
javaOutDir=/path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java
avroOutDir=/path/to/work/trunk/schemas_registry
userName=person
password=person
namespace=com.linkedin.events.example.person
Generating schema for sy$person
Processing column sy$person.TXN:NUMBER
Processing column sy$person.KEY:NUMBER
Processing column sy$person.FIRST_NAME:VARCHAR2
Processing column sy$person.LAST_NAME:VARCHAR2
Processing column sy$person.BIRTH_DATE:DATE
Processing column sy$person.DELETED:VARCHAR2
Generated Schema:
{
“name” : “Person_V1”,
“doc” : “Auto-generated Avro schema for sy$person. Generated at Dec 04, 2012 05:07:05 PM PST”,
“type” : “record”,
“meta” : “dbFieldName=sy$person;”,
“namespace” : “com.linkedin.events.example.person”,
“fields” : [ {

"name" : "txn",
"type" : [ "long", "null" ],  
"meta" : "dbFieldName=TXN;dbFieldPosition=0;"  

}, {

"name" : "key",
"type" : [ "long", "null" ],  
"meta" : "dbFieldName=KEY;dbFieldPosition=1;"  

}, {

"name" : "firstName",
"type" : [ "string", "null" ],  
"meta" : "dbFieldName=FIRST_NAME;dbFieldPosition=2;"  

}, {

"name" : "lastName",
"type" : [ "string", "null" ],  
"meta" : "dbFieldName=LAST_NAME;dbFieldPosition=3;"  

}, {

"name" : "birthDate",
"type" : [ "long", "null" ],  
"meta" : "dbFieldName=BIRTH_DATE;dbFieldPosition=4;"  

}, {

"name" : "deleted",
"type" : [ "string", "null" ],  
"meta" : "dbFieldName=DELETED;dbFieldPosition=5;"  

} ]
}
Avro schema will be saved in the file: /path/to/work/trunk/schemas_registry/com.linkedin.events.example.person.Person.1.avsc
Generating Java files in the directory: /path/to/work/trunk/databus2-example-events/databus2-example-person/src/main/java
Done.

(Note that /path/to/work will be whatever path prefix you used in the -avroOutDir and -javaOutDir options.)