-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(streaming): add redis pubsub in NotifBrokerActor, supports data …
…AES256 encryption through the channel
- Loading branch information
Showing
11 changed files
with
2,364 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
|
||
### **1. Kafka Topics** | ||
A **topic** in Kafka is a logical channel to which producers send messages and from which consumers read messages. Think of a topic as a named category or feed, and each message published to Kafka is categorized under a topic. It's like an exchange it RMQ. Topic is where all the messages get collected in there. | ||
|
||
- **Example**: If you are working with a log aggregation system, you might have topics like `application_logs`, `error_logs`, and `event_logs`. | ||
|
||
- **Durability**: Kafka topics store data for a specified retention period, even after messages are consumed. This makes it possible to replay or reprocess the data. Unlike the RMQ which removes the messages from the queue once the cosumer receives them this enforces us to use a new queue per each consumer. | ||
|
||
### **2. Partitions** | ||
A **topic** is divided into multiple **partitions** to enable parallelism and scalability. | ||
|
||
- **Partitions** allow Kafka to scale horizontally, meaning that more partitions allow more consumers to consume in parallel, leading to higher throughput. | ||
|
||
- **Message Order**: Within a single partition, Kafka guarantees the order of messages (i.e., messages are read in the order they are written). However, across multiple partitions, Kafka doesn't guarantee message ordering. | ||
|
||
- **Partition Key**: When producing messages, you can specify a **partition key** to control which partition the message is routed to. If no key is specified, Kafka will use a round-robin strategy to distribute messages across partitions. It's like routing key in RMQ. | ||
|
||
- **Example**: A topic `sensor_readings` could have 10 partitions. If you send messages with the same sensor ID as a partition key, all readings from that sensor would go to the same partition, maintaining their order. | ||
|
||
### **3. Messages** | ||
A **message** in Kafka is the basic unit of data. It consists of: | ||
- **Key** (optional): Helps to determine which partition the message is routed to. | ||
- **Value**: The actual data (the payload) that is being sent (e.g., a JSON object, string, etc.). | ||
- **Timestamp**: When the message was created. | ||
|
||
Kafka **messages** are stored in **topics** and are read by consumers. | ||
|
||
- **Message Structure**: Messages can be serialized in different formats (e.g., JSON, Avro, Protobuf), depending on how the data is intended to be consumed. | ||
|
||
### **4. Batch Sending** | ||
To optimize performance, Kafka producers can send messages in **batches** rather than individually. | ||
|
||
- **Batching** reduces the number of network requests, as multiple messages are sent in a single request. This leads to higher throughput and better utilization of Kafka brokers. | ||
|
||
- Producers buffer messages in memory and send them as a batch when either: | ||
- A certain batch size limit is reached. | ||
- A certain time limit is exceeded. | ||
|
||
- **Trade-off**: Sending in batches reduces network overhead but can increase latency, as messages are delayed in memory until the batch is full or the time limit is reached. | ||
|
||
### **5. Offset** | ||
An **offset** is a unique identifier that Kafka assigns to each message within a partition. It indicates the position of a message within that partition. Allows consumers resume consuming where they've left. | ||
|
||
- **Message Offset**: Kafka keeps track of each message using its offset within the partition. Each partition has its own sequence of offsets starting from `0` and increasing as more messages are produced. | ||
|
||
- **Consumer Offset**: Consumers use offsets to track which messages have been read. When a consumer reads a message from a partition, it can store the offset to know where to resume if it needs to continue later (e.g., after a crash or restart). | ||
|
||
- **Offset Example**: In a partition, the first message might have offset `0`, the next one `1`, and so on. | ||
|
||
### **6. Single Consumers vs. Consumer Groups** | ||
Kafka's flexibility comes from how it handles consumers, and it supports two main models: **single consumers** and **consumer groups**. | ||
|
||
#### **Single Consumer** | ||
A **single consumer** consumes data from one or more partitions of a topic. Each partition is assigned to one consumer, and only one consumer processes messages from each partition. | ||
|
||
- **Example**: If a topic has 3 partitions and you have 1 consumer, that consumer will read from all 3 partitions sequentially. | ||
|
||
#### **Consumer Groups** | ||
A **consumer group** is a group of consumers that work together to consume messages from a topic. Kafka ensures that each partition is consumed by **only one consumer within the group**. | ||
|
||
- **Parallel Processing**: Kafka distributes partitions among consumers in the group, allowing messages to be processed in parallel. | ||
|
||
- **Example**: If you have a topic with 4 partitions and 2 consumers in the same group, Kafka will assign 2 partitions to each consumer. If a consumer crashes, Kafka will rebalance and reassign the partitions to the remaining consumers. | ||
|
||
- **Multiple Consumer Groups**: Multiple consumer groups can consume the same topic independently. Each group maintains its own offsets, so they don't interfere with one another. | ||
- **Example**: You could have two different consumer groups, one for real-time processing (group A) and another for batch processing (group B), both consuming from the same topic but handling the data in different ways. | ||
|
||
- Consumer groups are like multiple queues bounded to multiple exchanges each of which receives related messages from the exchanges. like consumer1 bind its queue to exchange1 and exchange2. | ||
|
||
### **7. Committing Messages** | ||
Committing a message in Kafka means recording that the message has been successfully processed by a consumer like ack in RMQ, allowing Kafka to manage offsets properly. | ||
|
||
- **Auto-Commit**: By default, Kafka can automatically commit offsets periodically, meaning that the consumer keeps track of the last message it read. However, this can lead to issues if the consumer crashes after reading a message but before processing it. | ||
|
||
- **Manual Commit**: With **manual commit**, consumers can explicitly commit offsets after processing each message (or batch of messages). This gives consumers control over when to mark a message as processed, ensuring better fault tolerance. | ||
|
||
- **Example**: Suppose a consumer reads a message, processes it, and then manually commits the offset for that message. If the consumer crashes before committing the offset, it will re-read the message upon recovery. | ||
|
||
- **Offset Committing in Consumer Groups**: Each consumer in a group commits offsets independently. If a consumer fails, the new consumer taking over the partition will resume from the last committed offset. | ||
|
||
--- | ||
|
||
### **How These Components Work Together in Kafka**: | ||
|
||
1. **Topic Creation**: | ||
- Let's say you have a Kafka topic `user_activity_logs` with 6 partitions. | ||
|
||
2. **Producer Sends Messages**: | ||
- A producer sends log messages (user interactions) to Kafka. If the producer specifies a **key** (e.g., `userID`), Kafka uses the key to route messages to specific partitions. Messages are sent in **batches** to reduce network overhead. | ||
|
||
3. **Consumers in a Group**: | ||
- You have 3 consumers (in **consumer group A**) reading from `user_activity_logs`. Kafka assigns each consumer 2 partitions to read from (since the topic has 6 partitions and there are 3 consumers). | ||
|
||
4. **Message Offset Tracking**: | ||
- Each message in the partition is assigned an **offset** (e.g., 0, 1, 2...). Consumers track these offsets so they know where to resume in case of failure. | ||
- The consumers can **commit offsets** manually or let Kafka handle it automatically, ensuring that they process each message only once. | ||
|
||
5. **Rebalancing and Scaling**: | ||
- If a new consumer is added to the group, Kafka automatically **rebalances** the partitions among all consumers, ensuring that no two consumers in the group read from the same partition. Conversely, if a consumer crashes, Kafka redistributes its partitions to the remaining consumers. | ||
|
||
--- | ||
|
||
### **Summary**: | ||
- **Kafka topics** are high-level channels for organizing data. | ||
- **Partitions** allow parallelism by splitting a topic into substreams. | ||
- **Messages** are the individual data units that are published and consumed. | ||
- **Batch sending** improves producer performance by sending multiple messages in a single network call. | ||
- **Offsets** track message positions within partitions, both for Kafka's storage and consumers' progress. | ||
- **Single consumers** read data from specific partitions, while **consumer groups** allow partitioned, parallel processing. | ||
- **Committing messages** is how Kafka ensures data processing reliability by keeping track of the last successfully processed message per consumer. | ||
|
||
This design enables Kafka to scale, handle large volumes of data, and maintain reliability in data processing systems. |
File renamed without changes.
Oops, something went wrong.