Kafka Filtering

Kafka-filtering solves the problem of filtering out messages from Kafka or any such stream very efficiently. This is very much like a grep for Kafka message stream.

Kafka doesn't support filtering ability for consumers. If a consumer needs to listen to a sub-set of messages published on to a Kafka topic, consumer has to read all & filter only what is needed. This is in-efficient as all the messages are to be deserialized & make such a decision. Other option is to create different topics: in such a case a consumer needs to consume from more than one topic & ordering is lost as well (as Kafka supports ordering only within a single topic)!

Filtering

This solves the problem by having headers (Map<String,String>) which gets encoded at producer side along with the actual data (byte[]). Consumer can express, based on these tags, what it wants to consume & filter-out the unwanted very efficiently. Encoding & Decoding of these {headers, data} are done using Flatbuffers. Thus its very efficient & it wont be taxing.

What is the overhead?

Benchmarked for the overhead of this for the following case: 1 KB data serialized data size with a 2-3 key-value entries as header (having map of 2-3 small key-value entries with key, value around 8-15 characters long string).

Codec.encode() => 4 micro secs overhead
Codec.decode() & compiled Filter application => 2 micro secs overhead

Tests are to be done for a wide range of serialized data sizes & header sizes.

How to integrate?

Stream producer needs to pass headers (Map<String,String>) along with the data & at consumer level the stream can be filtered by providing an MVEL expression (grep filter expression what consumer wants).

Example code: https://github.com/flipkart-incubator/kafka-filtering/blob/master/exp-filtering-mvel/src/test/java/FilterTest.java

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data-header		data-header
exp-filtering-mvel		exp-filtering-mvel
flatbuffers		flatbuffers
kafka-wrappers		kafka-wrappers
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Filtering

Filtering

What is the overhead?

How to integrate?

About

Releases

Packages

Languages

flipkart-incubator/kafka-filtering

Folders and files

Latest commit

History

Repository files navigation

Kafka Filtering

Filtering

What is the overhead?

How to integrate?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages