20210120

Kafka , partitions and consumers

I have been interested in kafka lately. I was curious about their statement that, for a given topic, you can have as many consumers as partitions it has, but not more, that will be assigned to get messages.

What would happen to the consumers in the group, if you add a new one, and all partitions are assigned already? Let's give it a go.

The setup

Before you start, you'll need a kafka broker up and running. I have already running in my local one from https://github.com/confluentinc/cp-all-in-one. Just spin it up with docker-compose and you'll get it available in the usual 9092 port.

I am using python as is quick and easy for this test. You can checkout the git repo from https://github.com/davidfrigola/kafka-consumers-test

Once you have all this, just install dependencies (kafka-python) using

pip install -r requirements.txt

And you are ready to go.

The first step is to create the topic. Use

python kafka-setup.py

This will create a poc-topic, with 2 partitions. Now we can start using it.

The producer

Run producer.py to get some simple messages in the topic. It will generate one each second, with a timestamp-ish value in it.

python producer.py
Message sent %s some data in bytes 1611170322.1077127
Message sent %s some data in bytes 1611170323.1272047
Message sent %s some data in bytes 1611170324.1288335
Message sent %s some data in bytes 1611170325.1305933

...

The consumer

Let's start a couple of consumers. Each will be assigned to a partition, as the are for the same group (poc-group)

This will look like:

python consumer.py
poc-topic:0:523: key=None value=b'some data in bytes 1611170357.2019012'
poc-topic:0:524: key=None value=b'some data in bytes 1611170358.2052937'
poc-topic:0:525: key=None value=b'some data in bytes 1611170359.2070754'

....

The test

So now we have a producer, a topic with 2 partitions, and 2 consumers getting the messages from that topic, assigned to the same group.

Let's add a third consumer:

python consumer.py
poc-topic:1:484: key=None value=b'some data in bytes 1611170278.447001'
poc-topic:1:485: key=None value=b'some data in bytes 1611170279.4486403'
poc-topic:1:486: key=None value=b'some data in bytes 1611170285.457752'
poc-topic:1:487: key=None value=b'some data in bytes 1611170286.4592147'
poc-topic:1:488: key=None value=b'some data in bytes 1611170289.4642537


After a couple of seconds .. it starts consuming messages! What is going on? Well , lets check the other 2 consumer's logs ....

One of them will get stalled. Apparently, the latest you have started will stop consuming in favor of the new one.

Conclusion

The number of consumers actively doing job in a kafka consumers group is limited by the number of partitions in the topic, but adding a new consumer is not a not-effect operation : the new consumer will get a partition assigned, and one of the already assigned consumers will loose it, getting stalled.

Bear it in mind!