Kafka

Contents

Kafka

What is Kafka?

Kafka is a distributed streaming platform that is designed for high-throughput, fault-tolerant, and real-time data processing. It also can be used at message delivery. RabbitMQ has only one broker while Kafka contains many message brokers.

Kafka Stream

Kafka Streams is a client library for processing and analyzing data stored in Kafka.

It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.

Components of Kafka

Broker, Topic, Partition, Producer, Consumer, Consumer group, zookeeper

Broker

A Kafka cluster is composed of multiple brokers. Each broker has its ID and contains certain topic partitions. The broker is responsible for the replication of the data across the cluster.

Topic

topic is a category to store the published records. Topics in Kafka are always multi-subscriber (multi consumer);

Topic have several copies stored in different partitions of different brokers. The number of copies is called replication factor.

Partition

Each topic may have one or more partitions. Each partition is ordered, immutable sequence of records that can be consumed.

Each partition has leader and multiple followers. The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader.

Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

Segment

Each partition consists of multiple segments. Segments are the physical files stored in Kafka. The segment size is configurable. Once the segment is full, it will create a new segment. The segment is immutable.

Producer

Producers are the publisher of messages to one or more Kafka topics. Producers send data to Kafka brokers. Producers automatically know to which broker and partition to send the data to through something called topic metadata.

Consumer

Consumers read data from brokers. Consumers subscribes to one or more topics and consume published messages by pulling data from the brokers.

Consumer group

consumer group is a mechanism that allows multiple consumer instances to work together to consume data from one or more Kafka topics. A consumer group enables parallel processing of messages, load balancing.

Zookeeper

Zookeeper is a distributed coordination service that is used to manage and coordinate Kafka brokers. Zookeeper manages brokers and maintains the state of the Kafka cluster. Zookeeper is used to elect a controller, which is responsible for administrative operations.

Kafka vs RabbitMQ

Kafka RabbitMQ
Kafka is a distributed streaming platform that is designed for high-throughput, fault-tolerant, and real-time data processing. RabbitMQ is a message broker that is used to send and receive messages.
Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. RabbitMQ is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol.
Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. RabbitMQ is designed to support a wide range of messaging applications, where a single server may support hundreds or thousands of queues.
Kafka is designed to be highly scalable. RabbitMQ is designed to be highly available and highly consistent.
Kafka is designed to be fast. RabbitMQ is designed to be easy to use.

Contents