Reliable Data Delivery in Kafka

Manik Khandelwal
2 min readMay 10, 2020

Reliability is an important criterion which everyone considers before designing an application. More so, reliability is the property of a system — not of a single component — so even when talking about reliability guarantees of Kafka, we need to keep the entire system and it’s use cases in mind.

Kafka is very flexible about reliable data delivery. Some of the use cases require utmost reliability(e.g. Bank transactions) while others may prioritize speed and simplicity over reliability(e.g. tracking clicks over the website). All this is achieved by configuring Kafka’s client API as it is flexible enough to allow all kinds of reliability trade-offs.

In this article, I am going to discuss the different kinds of reliabilities that Kafka offers and a developer/administrator should consider while designing an application around it.

We usually talk about the reliability of the system in terms of guarantees, The behavior a system is guaranteed to preserve under different circumstances. Understanding the guarantees that Kafka provides is critical for building reliable applications. This understanding allows developers to understand how the system would behave under different failure scenarios.

What Kafka Guarantees :

  • Order guarantee of a message in the partition. If a message B was written after message A from the same producer in the same partition, then Kafka guarantees the offset of message B would be higher than message A and the consumer would consume message A first and then message B.
  • Produced messages are considered committed when it is written to the partition of all in-sync replicas(not necessarily flushed to disc). Producers can configure to receive acknowledgments of sent messages when fully committed when written to the leader when sent to the network.
  • Committed messages won’t be lost as at least one in-sync remains alive.
  • Consumers can read only those messages which are committed.

These basic guarantees can be considered while building up a reliable system, but in themselves, don’t make the system fully reliable. There are trade-offs involved in making a reliable system and Kafka was designed to allow administrators and developers to decide how much reliability they need by giving configuration parameters that allow controlling these trade-offs.

So, this is pretty much from me for today, in my next article I would describe Kafka's replication mechanism and how it contributes to reliability.

--

--

No responses yet