Skip to content
James Vaughan edited this page Nov 9, 2018 · 5 revisions

Resources

Download (Version 2.11):
https://www.apache.org/dyn/closer.cgi?path=/kafka/2.0.0/kafka_2.11-2.0.0.tgz

Kafka Quick-start Guide:
https://kafka.apache.org/quickstart

Alpakka Kafka Connector Documentation:
https://doc.akka.io/docs/akka-stream-kafka/current/home.html

Installation

This quick-start guide above is pretty short and comprehensive, but to further condense things:

  1. Download Kafka 2.11-2.0.0 above.
  2. Extract it, and place it somewhere sensible, remember the path.
  3. Navigate to "WorkflowFM-PEW/resources/kafka"
  4. In a bash shell, run: "bash ./start-servers.sh path/to/kafka/root", this will start a zookeeper and kafka server using our configuration files.
  5. In another shell, run: "bash ./clean-topics.sh path/to/kafka/root", this will reinitialise the relevant topics (and delete any data in them).

The same steps can work on windows by using the batch scripts instead, and crossing your fingers really hard.

Kafka Cliff Notes

Kafka is a messaging system, best thought of as collection of persistent and immutable message streams call "Topics". A single Kafka cluster can host multiple topics which can be written to (by Producers) and subscribed to (by Consumers) individually. Topics are further subdivided into "Partitions" using the key of a message, where all messages that share a key are sent to the same "Partition", and all messages in a partition have a guaranteed order that is consistent for all Consumers.

Consumers are organised into Consumer Groups, and each Consumer Group will consume each messages only once. To do this each Consumer Group stores an "offset" for each partition it subscribes to, the offset points to the next unconsumed message: meaning that younger messages within a partition cannot be consumed before older messages. Unconsumed messages will remain available in the event of a failure, so it is necessary that offsets are only "committed" when the messages they would consume are no longer required - this is usually linked to the production of new messages.

"Exactly-once-semantics" message delivery is achievable in Kafka through use of "Transactions" which directly tie message production to the commit of offset so it happens atomicly. Otherwise Kafka is "at-least-once" ie. the message is produced before any input messages are consumed, so in event of a failure messages might be revisited.

Alpakka Kafka Connector

Alpakka is the Kafka driver we use to connect to the Kafka server. It is built on Akka Streams to provide a simplified API. Kafka Producers are represented as Akka Sinks and Kafka Consumers and Akka Sources.

Clone this wiki locally