DevOps

Kafka Idempotence Performance Analysis

Apache Kafka is designed for high-throughput, fault-tolerant, distributed messaging. In real-world systems, producers may retry sending messages due to network failures, broker restarts, or transient errors. These retries can result in duplicate messages, which can be problematic for consumers and downstream systems. Kafka addresses this problem using idempotent producers. Let us delve into understanding Java Kafka idempotence and its impact on performance.

1. What is Idempotence?

In general computing terms, an operation is idempotent if performing it multiple times produces the same result as performing it once. This property is critical in distributed systems where network failures, retries, and partial failures are common. In Kafka, idempotence is a producer-side feature that prevents duplicate records from being written to a partition when retries occur. It guarantees that:

  • Even if a producer retries sending the same message due to transient failures
  • The message is written to the target partition only once and in order

This ensures data correctness without requiring the application to implement custom deduplication logic.

1.1 How Kafka Achieves Idempotence

Kafka implements idempotence by maintaining strict tracking of producer state at the broker level:

  • Each producer instance is assigned a unique producer.id by the broker
  • Each message batch sent by the producer contains a monotonically increasing sequence number
  • The broker maintains the last committed sequence number for every producer per partition
  • If the broker receives a batch with a sequence number that has already been processed, it is treated as a duplicate and discarded

This mechanism ensures that retries do not introduce duplicate records, even in the presence of network errors or broker restarts.

1.2 When Should You Enable Idempotent Producers?

Enable idempotence when:

  • You require exactly-once semantics at the producer level
  • Your application cannot tolerate duplicate messages
  • You rely on retries (retries > 0) for fault tolerance
  • You are building pipelines where correctness and ordering are more important than raw throughput

Do not enable idempotence if:

  • You need maximum throughput and your downstream systems can handle duplicates
  • Ultra-low latency is more important than delivery guarantees
  • Your use case is tolerant of occasional duplicate events

1.3 Performance Impact

While idempotence significantly improves reliability, it does introduce some overhead:

  • Slightly lower throughput due to sequence tracking and stricter acknowledgment requirements
  • Additional metadata exchanged between producers and brokers
  • Retries become safe, deterministic, and free of duplication side effects
  • Overall system reliability and data consistency are improved

In most real-world applications, the trade-off between performance and correctness strongly favors enabling idempotence.

1.3.1 Measuring the Performance Impact

While enabling idempotence in Kafka improves reliability and prevents duplicate messages, it also introduces some performance overhead due to sequence tracking, additional acknowledgments, and metadata management. Key Metrics to Monitor:

  • Throughput: The number of messages sent per second. Enabling idempotence may slightly reduce throughput because of the extra acknowledgment and sequencing overhead.
  • Latency: The time it takes for a message to be sent and acknowledged. Idempotent producers may experience slightly higher latency due to the need for full acknowledgment from in-sync replicas.
  • Retries: The number of times the producer retries sending a message. With idempotence enabled, retries are safe and do not cause duplicates.
  • Errors: Any failed sends. Monitoring error rates ensures that retries are functioning correctly and no duplicate messages are created.

2. Setting Up Kafka on Docker

Running Kafka in Docker is an easy way to experiment, test, or develop locally without installing it directly on your system. We will use docker-compose to bring up both Kafka and Zookeeper. Create a file named docker-compose.yml with the following content:

version: '3.8'

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

2.1 Start Kafka and Zookeeper

Run the following command to start the services in detached mode:

docker-compose up -d

You can check the running containers using:

docker ps

2.2 Create a Kafka Topic

Once Kafka is running, create a topic for testing idempotent producers. Here we create a topic named idempotent-topic:

docker exec -it kafka kafka-topics --create --topic idempotent-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

2.3 Verify the Topic

Ensure the topic is created successfully:

docker exec -it kafka kafka-topics --list --bootstrap-server localhost:9092

3. Java Code Example

3.1 Idempotent Kafka Producer (Java)

// IdempotentProducerExample.java
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class IdempotentProducerExample {

    public static void main(String[] args) {

        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // Enable idempotence
        props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

        // Safe producer settings
        props.put(ProducerConfig.ACKS_CONFIG, "all");
        props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
        props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

        KafkaProducer<String, String> producer = new KafkaProducer<>(props);

        for (int i = 0; i < 5; i++) {
            ProducerRecord<String, String> record =
                    new ProducerRecord<>("idempotent-topic", "key-" + i, "message-" + i);

            producer.send(record, (metadata, exception) -> {
                if (exception == null) {
                    System.out.println(
                        "Sent message to partition " + metadata.partition() +
                        " with offset " + metadata.offset()
                    );
                } else {
                    exception.printStackTrace();
                }
            });
        }

        producer.flush();
        producer.close();
    }
}

3.1.1 Code Explanation

This Java code demonstrates how to create an idempotent Kafka producer that guarantees exactly-once message delivery at the producer level. First, it sets up the necessary Properties including the Kafka bootstrap server and serializers for both key and value. Idempotence is enabled with ENABLE_IDEMPOTENCE_CONFIG=true, and additional safe settings are applied: ACKS_CONFIG=all ensures that messages are fully acknowledged by all in-sync replicas, RETRIES_CONFIG=Integer.MAX_VALUE allows unlimited retries on transient failures, and MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=5 keeps message order while allowing some parallelism. A KafkaProducer instance is created with these properties, and a loop sends 5 messages to the topic idempotent-topic with unique keys. Each send includes a callback to print the partition and offset on success or log exceptions on failure. Finally, producer.flush() ensures all pending messages are sent, and producer.close() cleans up resources. This setup guarantees that even if retries occur due to network or broker issues, duplicate messages are never written to the topic.

3.2 Kafka Consumer (Java)

// ConsumerExample.java
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class ConsumerExample {

    public static void main(String[] args) {

        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "demo-group");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Collections.singletonList("idempotent-topic"));

        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));

            for (ConsumerRecord<String, String> record : records) {
                System.out.println(
                    "Consumed message: key=" + record.key() +
                    ", value=" + record.value() +
                    ", offset=" + record.offset()
                );
            }
        }
    }
}

3.2.1 Code Explanation

This Java code demonstrates a simple Kafka consumer that reads messages from the idempotent-topic. It first sets up Properties including the Kafka bootstrap server and deserializers for both key and value using StringDeserializer. The consumer is assigned to a group named demo-group, and AUTO_OFFSET_RESET_CONFIG=earliest ensures that it starts reading from the beginning of the topic if no previous offset is committed. A KafkaConsumer instance is created and subscribes to the topic. Inside an infinite loop, the consumer polls for records every second using poll(Duration.ofMillis(1000)). For each received ConsumerRecord, it prints the message key, value, and offset to the console. This consumer can reliably read all messages sent by the idempotent producer, preserving order and ensuring that no duplicates are processed even if the producer retries occurred, making it ideal for exactly-once processing scenarios.

3.3 Code Run and Output

When the idempotent producer code is executed, the console output shows each message being sent with its corresponding partition and offset. Despite any potential retries due to network or broker interruptions, each message receives a unique offset and is written exactly once to the topic, demonstrating Kafka’s idempotence guarantees. The producer output appears as:

Sent message to partition 0 with offset 0
Sent message to partition 0 with offset 1
Sent message to partition 0 with offset 2
Sent message to partition 0 with offset 3
Sent message to partition 0 with offset 4

On the consumer side, running the consumer code continuously polls the topic and prints each message’s key, value, and offset in the order they were produced. This confirms that all messages sent by the producer are received exactly once, preserving their order and consistency. The consumer output is:

Consumed message: key=key-0, value=message-0, offset=0
Consumed message: key=key-1, value=message-1, offset=1
Consumed message: key=key-2, value=message-2, offset=2
Consumed message: key=key-3, value=message-3, offset=3
Consumed message: key=key-4, value=message-4, offset=4

This example illustrates the practical effect of enabling idempotence in Java Kafka producers: even with retries or transient errors, no duplicate messages are written, and consumers can reliably process messages in order. It demonstrates exactly-once semantics at the producer level, making Kafka suitable for applications where data consistency and reliability are critical, such as financial transactions, event-driven pipelines, and distributed systems requiring strong guarantees.

3.4 Demonstrating Idempotence Under Failure

The previous examples demonstrate normal producer operation. However, idempotence is most relevant when failures occur and the producer must retry sending records. In this section, we introduce a failure scenario that triggers producer retries and verify that no duplicate messages are written to Kafka.

3.4.1 Failure Scenario Overview

To observe idempotence in action, we run the producer while temporarily interrupting the Kafka broker. This interruption causes send failures and forces the producer to retry the same records. If idempotence is working correctly, Kafka will ensure that each record is written only once, even though duplicate send attempts occur.

  • The producer sends messages continuously
  • The Kafka broker is stopped while messages are in flight
  • The producer retries sending the same records
  • The broker discards duplicate batches based on sequence numbers

3.4.2 Idempotent Producer with Retry Scenario

// IdempotentProducerWithFailure.java
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class IdempotentProducerWithFailure {

    public static void main(String[] args) throws Exception {

        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // Enable idempotence
        props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

        // Safe retry settings
        props.put(ProducerConfig.ACKS_CONFIG, "all");
        props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
        props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

        KafkaProducer<String, String> producer = new KafkaProducer<>(props);

        int i = 0;
        while (i < 10) {
            ProducerRecord<String, String> record =
                    new ProducerRecord<>("idempotent-topic", "key", "message-" + i);

            producer.send(record, (metadata, exception) -> {
                if (exception != null) {
                    System.out.println("Send failed, retrying: " + exception.getMessage());
                } else {
                    System.out.println("Sent message with offset " + metadata.offset());
                }
            });

            i++;
            Thread.sleep(1000); // Slow down sends to allow broker interruption
        }

        producer.flush();
        producer.close();
    }
}

This class demonstrates an idempotent Kafka producer configured to safely handle failures and retries without producing duplicate messages. The producer is initialized with standard Kafka connection and serialization settings, and idempotence is explicitly enabled using ENABLE_IDEMPOTENCE_CONFIG=true, ensuring that retries do not result in duplicate writes. Reliability is further reinforced by requiring acknowledgments from all in-sync replicas (acks=all), allowing unlimited retries, and limiting the number of in-flight requests to preserve ordering. The producer sends a sequence of ten messages to the idempotent-topic, pausing briefly between sends to make it possible to interrupt the broker during execution. If a send attempt fails due to broker unavailability or network issues, the producer automatically retries the same record, which is evident from the logged error messages. Once the broker becomes available again, message delivery resumes, and Kafka ensures that each message is written exactly once by tracking producer sequence numbers at the broker. Finally, the producer flushes any pending records and closes cleanly, demonstrating how idempotence guarantees correctness even when failures occur.

Retries are handled internally by the Kafka producer when RETRIES_CONFIG is set (here, Integer.MAX_VALUE). The code does not implement application-level retries; the producer automatically retries failed sends. Idempotence ensures that these internal retries never result in duplicate messages.

3.4.3 Triggering the Failure

While the producer is running, stop the Kafka broker:

docker stop kafka

After a few seconds, restart the broker:

docker start kafka

During this period, the producer encounters send failures and retries the same batches once the broker becomes available again.

3.4.4 Producer Output (Retries Observed)

During the broker outage, the producer logs retry-related errors:

Send failed, retrying: Connection to node -1 failed
Send failed, retrying: TimeoutException

Once the broker restarts, message delivery resumes without application intervention.

3.4.5 Consumer Output (No Duplicate Messages)

After the producer completes, run the consumer from the beginning of the topic. The output shows:

Consumed message: key=key, value=message-0, offset=0
Consumed message: key=key, value=message-1, offset=1
Consumed message: key=key, value=message-2, offset=2
Consumed message: key=key, value=message-3, offset=3
Consumed message: key=key, value=message-4, offset=4
Consumed message: key=key, value=message-5, offset=5
Consumed message: key=key, value=message-6, offset=6
Consumed message: key=key, value=message-7, offset=7
Consumed message: key=key, value=message-8, offset=8
Consumed message: key=key, value=message-9, offset=9

Despite producer retries caused by broker unavailability, each message appears exactly once and in order. No duplicate offsets are observed.

3.4.6 Idempotence Verification

This behavior confirms that idempotence is actively enforced by Kafka:

  • The producer is assigned a unique producer ID by the broker
  • Each batch includes a monotonically increasing sequence number
  • Duplicate retry batches are detected and discarded by the broker
  • Only the first successful write for each sequence number is committed

As a result, retries caused by failures do not lead to duplicate messages, demonstrating that Kafka’s idempotent producer guarantees hold under real failure conditions.

4. Conclusion

Idempotence in Kafka significantly improves producer reliability by eliminating duplicate messages caused by retries. While it introduces a small performance overhead, the trade-off is often worthwhile for systems requiring strong delivery guarantees. For mission-critical pipelines, financial transactions, and event-driven architectures, idempotent producers should be the default choice.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button