Introduction: The Foundation of Kafka's Messaging Model

In the previous article of this series, we posed an important question: What exactly is LEO (Log End Offset)? This fundamental concept lies at the heart of Kafka's architecture and understanding it is crucial for mastering Kafka's internal workings. In this comprehensive guide, we will explore the complete ecosystem of offsets in Kafka, from basic concepts to advanced replica synchronization mechanisms.

Offsets in Kafka serve as the backbone of message ordering, delivery guarantees, and consumer progress tracking. Just as a bank issues sequential ticket numbers to customers waiting in line, Kafka assigns monotonically increasing identifiers to every message, creating an immutable audit trail of all data flowing through the system.

Understanding Offset: The Basic Building Block

What is an Offset?

An Offset is a monotonically increasing 64-bit integer that uniquely identifies each message within a Kafka partition. When a Producer writes data to a Partition, Kafka performs the following steps:

  1. Assignment: Kafka assigns a sequential number to the incoming message
  2. Persistence: The message, along with its offset, is written to the log file
  3. Indexing: The offset serves as a key for efficient message retrieval

The Bank Queue Analogy

To understand offsets intuitively, consider a bank's customer queuing system:

  • Each customer receives a numbered ticket upon arrival
  • Each subsequent number is exactly one greater than the previous
  • No two customers ever receive the same number
  • The numbers provide a clear ordering of service

In Kafka, messages are like bank customers, and offsets are their ticket numbers. This simple yet powerful mechanism ensures:

  • Total Ordering: Within a partition, messages have a definitive sequence
  • Uniqueness: No two messages share the same offset
  • Immutability: Once assigned, an offset never changes
  • Efficient Lookup: Offsets enable O(1) message retrieval

The Dynamic Duo: LEO and High Watermark

LEO (Log End Offset): The Frontier of Writing

LEO stands for Log End Offset, representing the offset value of the next message to be written to a replica's log. Think of LEO as the "write cursor" position.

Example: If a replica contains messages with offsets 0 through 11 (12 messages total), the LEO value is 12, indicating where the next message will be placed.

Key characteristics of LEO:

  • Per-Replica: Each replica maintains its own LEO value
  • Monotonically Increasing: LEO only moves forward as new messages arrive
  • Write Position: Indicates the boundary between written and unwritten space

HW (High Watermark): The Boundary of Visibility

HW stands for High Watermark, and it defines message visibility for consumers. This is perhaps the most critical concept for understanding Kafka's delivery semantics.

The Visibility Rule:

  • Messages with offsets less than HW are committed and visible to consumers
  • Messages with offsets greater than or equal to HW are uncommitted and invisible to consumers

This mechanism provides Kafka's at-least-once delivery guarantee by ensuring consumers only see messages that have been successfully replicated across the cluster.

The Relationship Between Leader and Follower HW

A crucial insight: The partition's High Watermark is determined by the Leader replica's HW value. This raises an important architectural question: How do Follower replicas synchronize their HW and LEO values with the Leader?

Storage Architecture: Where HW and LEO Live

Leader Replica Storage (Broker0)

The broker hosting the Leader replica maintains:

  • Leader HW: The partition's authoritative high watermark
  • Leader LEO: The leader's current log end offset
  • Remote Replica LEOs: The LEO values of all Follower replicas (Remote Replicas)

The Remote Replica LEO tracking is essential because the Leader uses these values to calculate the partition's HW.

Follower Replica Storage (Broker1, Broker2, etc.)

Each Follower broker maintains:

  • Follower HW: Its local high watermark (synced from Leader)
  • Follower LEO: Its current log end offset

This distributed storage pattern enables Kafka's fault-tolerant architecture while maintaining consistency across replicas.

Synchronization Protocol: How Replicas Stay in Sync

Leader Replica Update Flow

The Leader handles two distinct request types, each triggering HW/LEO updates:

Scenario A: Handling Producer Requests

When the Leader receives data from a Producer:

  1. Write: Append the message(s) to the local log
  2. Update LEO: Increment the Leader's LEO value
  3. Recalculate HW: Update the partition's high watermark based on Follower progress

Scenario B: Handling Follower Fetch Requests

When a Follower requests data:

  1. Read: Retrieve data from disk or cache
  2. Update Remote LEO: Use the offset value in the Follower's request to update that Follower's recorded LEO
  3. Recalculate HW: Recompute the partition's high watermark

The HW Calculation Algorithm

The core HW calculation follows this formula:

currentHW = max(currentHW, min(LEO_leader, LEO_follower1, LEO_follower2, ..., LEO_followerN))

Interpretation: The High Watermark cannot exceed the LEO of any in-sync replica. This ensures that only messages replicated to all ISR (In-Sync Replicas) members become visible to consumers.

Follower Replica Update Flow

From the Follower's perspective, the process is simpler:

  1. Fetch: Request and receive messages from the Leader
  2. Write Locally: Persist the messages to the local log
  3. Update LEO: Increment the local LEO value
  4. Update HW: Set local HW to min(Leader's HW, local LEO)

The HW update rule ensures Follower HW never exceeds either the Leader's authoritative HW or the Follower's own write position.

Step-by-Step Example: A Complete Synchronization Cycle

Let's walk through a concrete example with one Leader and one Follower replica:

T0: Initial State

ReplicaHWLEORemote LEO
Leader000
Follower00-

Both replicas start empty with all values at zero.

T1: Producer Writes One Message

The Producer sends a message to the Leader:

ReplicaHWLEORemote LEO
Leader010
Follower00-

The Leader's LEO advances to 1, but HW remains 0 (message not yet replicated).

T2: Follower Fetches Message

The Follower sends a Fetch request and receives the message:

ReplicaHWLEORemote LEO
Leader010
Follower01-

The Follower writes the message and updates its LEO to 1.

T3: Follower Acknowledges Progress

The Follower sends another Fetch request with fetchOffset = 1:

  1. Leader receives request: Updates Remote LEO to 1
  2. Leader recalculates HW: min(LEO_leader=1, LEO_follower=1) = 1, so HW becomes 1
  3. Leader responds: Includes HW = 1 in the response
  4. Follower receives response: Updates its local HW to 1

Final state:

ReplicaHWLEORemote LEO
Leader111
Follower11-

The synchronization cycle is complete! The message is now visible to consumers on both replicas.

Consumer Offset Management: The __consumer_offsets Topic

Evolution from ZooKeeper to Internal Topic

In early Kafka versions, consumer offsets were stored in ZooKeeper. However, this design had significant limitations:

  • Write Performance: ZooKeeper's sequential write model couldn't handle high-frequency offset commits
  • Scalability: As consumer groups grew, ZooKeeper became a bottleneck
  • Operational Complexity: Managing offset data in a separate system added operational overhead

The Modern Solution: __consumer_offsets

Kafka introduced an internal topic called __consumer_offsets to store consumer progress:

Topic Characteristics:

  • Default Partitions: 50 partitions (configurable)
  • Internal Topic: Not visible in normal topic listings
  • Compact Log: Uses log compaction to prevent unbounded growth

Data Structure: Key-Value Pairs

Each offset commit creates a key-value record:

Key Structure:

Key = GroupID + TopicName + PartitionNumber

This triple ensures that each consumer group's progress on each partition has a unique identifier.

Value Structure:

Value = {
    Offset: long,
    Metadata: string,
    Timestamp: long,
    ...additional metadata
}

Log Compaction: Preventing Storage Explosion

Without intervention, continuous offset commits would cause the __consumer_offsets topic to grow indefinitely. Kafka solves this with log compaction:

How It Works:

  • For each unique key, only the latest value is retained
  • Older values for the same key are eventually removed
  • This ensures the topic size remains proportional to the number of active consumer groups, not the number of commits

Benefits:

  • Bounded Storage: Topic size grows with active consumers, not commit frequency
  • Fast Recovery: Consumers can quickly recover their last committed position
  • Efficient Lookup: Latest offset for any group-partition pair is readily available

Advanced Concepts and Best Practices

Understanding ISR (In-Sync Replicas)

The High Watermark calculation only considers In-Sync Replicas—followers that are current enough with the Leader. Replicas that fall too far behind are removed from ISR, preventing them from blocking HW advancement.

Consumer Group Coordination

Consumer groups use the offset management system to:

  • Track progress across restarts
  • Enable failover between group members
  • Support exactly-once semantics (with transactions)

Monitoring and Operations

Key metrics to monitor:

  • Consumer Lag: Difference between LEO and committed offset
  • HW Progression: Rate at which HW advances
  • ISR Shrink/Expand: Frequency of replica synchronization issues

Summary and Key Takeaways

This comprehensive exploration of Kafka offsets has covered:

  1. Offset Fundamentals: The monotonically increasing identifier that orders all messages within a partition
  2. LEO (Log End Offset): The write cursor position, indicating where the next message will be placed
  3. HW (High Watermark): The visibility boundary, ensuring consumers only see committed messages
  4. Replica Synchronization: The protocol by which Leaders and Followers coordinate HW and LEO values
  5. Consumer Offset Management: The evolution from ZooKeeper to the __consumer_offsets internal topic
  6. Log Compaction: The mechanism that prevents unbounded growth of offset data

Understanding these concepts is essential for:

  • Troubleshooting: Diagnosing replication lag and consumer issues
  • Performance Tuning: Optimizing producer and consumer configurations
  • Architecture Design: Making informed decisions about replication factors and acknowledgment settings

Kafka's offset management system exemplifies elegant distributed systems design: simple primitives (monotonically increasing integers) combined with careful protocols (HW calculation, log compaction) to achieve robust, scalable message delivery guarantees.


This article is part of the "Learning Kafka from Zero" series. Future articles will explore producer internals, consumer group rebalancing, and advanced stream processing patterns.