Skip to content
Go back

Kafka In-Sync Replica (ISR)

Table of contents

Open Table of contents

Replication in kafka

Data replication is how kafka ensures fault tolerance. Topics are cut into non-overlapping partitions and each partitions can have multiple replicas. These replicas can be on different brokers, ideally one or less on each broker. So if one broker crashes, data is not lost and can still be served from other working brokers. While the redundancy ensures strong consistency, it has a trade-off of increased latency. However, the level of redundancy can be controlled from several features kafka provides like producer acks, replication factor and minimum in-sync replicas.

Minimum in-sync replicas

Before that here is something to remember:

replicationFactor > partitions (fine)
replicationFactor <= number of brokers (fine)
replicationFactor > number of brokers (NOPE, not possible!) Two replicas of same partition of same topic cannot exist on same broker
partitions > number of broker (fine) Number of partition is mostly consumer parallelism focused

Replication factor is defined for each topic at the time of topic creation with --replication-factor option. Similarly number of partitions is defined for each topic at time of creation with --partitions option. If not provided the default value of both is 1.

A replication factor of 3 means data will be stored on leader and 2 follower replicas.

While this replication happens a property min.insync.replicas ensures that this many replicas ( including the leader) must write the data to its log file if the property on producer is acks=all. Default value of min.insync.replicas is 1. A message is not visible to consumers until message is replicated to the number of replicas specified in min.insync.replicas.

With min.insync.replicas and acks=all, there is a high guarantee of durability of data.

min.insync.replicas replication.factor

min.insync.replicas can be configure at both broker level as well as when we create topics.

Each partition replica leader tracks the lag of follower replica. The followers send fetech requests to leader to get latest logs. If the leader does not receive fetch request within replica.lag.time.max.ms then follower is considers out of sync and removed from ISR.

If the leader crashed then the new leader is elected from one of ISR if unclean.leader.election.enable is disabled. This can again be set on broker or topic creation time.

A very nice sequence diagram of kafka replication I found

kafkaReplication.png I dont know what highwater is yet though!

What are the benefits?

Tradeoffs


Share this post on:

Previous Post
How to manage multiple github accounts
Next Post
Kubernetes Storage