kafka configuration and zookeeper description (kafka, zookeeper, akhq, quorum controller)

Hello, this is codeshow .
This time, we will practice kafka .

For practice, please clone the devcontainers repository of codeshow github .
Run vscode from kafka folder.
Run devcontainers .
Wait until the container runs.
When the container is finished running, we will open docker desktop.
3 kafka and zoo keeper AKHQ are running.

Describe these containers:
The zoo keeper is the distribution system coordinator.
Manage multiple kafka servers in a distributed environment.

AKHQ provides a web UI to conveniently manage kafka .
It uses port 8080.

zookeeper periodically sends a heart beat signal to all kafka nodes. zookeeper makes a ping request, Kafka makes a pong response. If some kafka node fails to respond, zookeeper considers that kafka node as failed.

If kafka number 3 fails,
ping and pong fail,
zookeeper deletes kafka number 3 from the management node.
Then, the remaining kafka 1 and 2 are notified of node 3 failure, and kafka 1 and 2 no longer communicate with 3.
For reference, when kafka3 is recreated, it is registered in zookeeper as shown in the figure, and kafka 1, 2, and 3 become cluster again.

For reference, in practice, there is only one zookeeper,
As shown in the figure, in the production environment, multiple zookeeper nodes are installed.
Configure the service to work even if one zookeeper goes down.
In class, we will use only 1 zookeeper because we don’t need high availability like production environment.

Let’s go to the Nodes menu provided by AKHQ with browser .
A total of 3 kafka nodes are running on the screen.

Stop kafka2 in docker desktop.
If you go back to AKHQ’s node menu and refresh,
You can see that kafka2 is gone.
Again, press the start button in docker desktop .
Wait for kafka to load and refresh the Nodes screen in AKHQ .
kafka2 will be looked up on screen again.
Among these kafka nodes, we will stop the control node.
If the control node previously failed the health check, you can see that another node becomes the control node.

In kafka, zookeeper is responsible for coordination between nodes in a distributed environment.
Since zookeeper’s admin server setting was set to true in the devcontainer settings,
We can check zookeeper information through 8081 port through browser.

I’ll type localhost colon 8081 slash commands in the browser as the address.

On the commands page, you can see commands provided by zookeeper as links.
Select the connections link.

http://localhost:8081/commands/connections

Through the json data retrieved by connections , you can check the information of the kafka node connected to zookeeper .

Let’s check if this IP is the kafka node by entering the docker container.

hostname -I

You can see that the zookeeper connection information matches the IP of the kafka node.
You can check the connection information of three kafka nodes in the connections array.
Now, you can see that a total of 3 kafka nodes are periodically ping checking.
As explained in the previous figure,
When ping or pong fails, zookeeper transmits information about the failed node to other kafka nodes.
The two remaining kafka nodes no longer communicate with the failed node.
And when the kafka node comes back to life in zookeeper and ping and pong succeed,
This information is propagated to the remaining two nodes again, and these kafka nodes communicate with the new node.

Additionally, every kafka node essentially needs one controller.
This controller node is a very important node that manages and controls other kafka nodes.
When using zookeeper, only one kafka node can be a controller node.
If this one controller node fails, all kafka cluster cannot operate.
So if the controller node fails, one of the other kafka nodes should be elected as the controller node.
At this time, nodes are elected through voting, and zookeeper is involved in this process.
Therefore zookeeper plays an important role in kafka distributed system.
For reference, the rest of the nodes except the controller node are called broker nodes. The node that stores partition, which you will learn about in the next lesson, is in charge of the broker node. The controller node manages the broker nodes and stores various metadata for use by the broker nodes.

For reference, kafka additionally provides a quorum controller starting from version 2.8.0.
It manages kafka metadata directly on the kafka node without using zookeeper through KRaft protocol without coordinating kafka with zookeeper .
Compared to the existing zookeeper, which has only one controller node,
A quorum controller can have multiple controller nodes.
That is, if a controller node fails, another controller node can be used immediately.
The controller node always synchronizes meta data.
The existing cost of copying meta data from zookeeper is reduced, so controller nodes can be started very quickly.
Right now you have two choices: zookeeper and quorum .
In the future, kafka may deprecate zookeeper.
However, since it is currently in transition, I think it is good to learn zookeeper and quorum controller in order.

We will practice simply querying zookeeper using the shell command.
Execute the shell of the container where kafka is installed.

Query all kafka via zookeeper .

zookeeper-shell zookeeper:2181 ls /brokers/ids

I confirmed that there are a total of three 0, 1, 2 cluster in the array.

Let’s look at the 0th kafka information with the get command.

zookeeper-shell zookeeper:2181 get /brokers/ids/0
zookeeper-shell zookeeper:2181 get /brokers/ids/1
zookeeper-shell zookeeper:2181 get /brokers/ids/2

You can check the kafka broker information stored in zookeeper .
Through the above, we learned how to store and coordinate various information of kafka in a distributed environment with zookeeper .

Above, we looked at kafka environment settings and internal operation.
Next time, we will learn about topic and partition.

Subscribe and like notification settings are very helpful for content creators.

thank you