Enterprise Java

Retention period and issue with Kafka data not getting deleted

Problem

Default value of retention.ms attribute on kafka topis is set to 7 days but data still persist for pre 7 days in topic.

Version of kafka : 2.1.11

An interesting problem was that even after data being older then 7 days and retention.ms attributes not being overridden ( kept at 7 days ) still we were able to see data in topic older then that.

This is normally OK but for some scenarios where kafka topics are used as source of truth to create a in memory cache on startup of application, this could lead to problem:

  • As they have to read more data on startup
  • Might end up having more data in cache

The problem emerges due the fact of another parameter which is not spoken much segment.ms . This parameter plays a major role.

This parameter decides when does the internal segment of topic gets role. By default again its 7 days.

Now the log cleaner thread only deletes the segment when the last message in a given segment is older then 7 days. So if last message arrived on Saturday and lets say segment rolled on Sunday ( after the week ), the whole segment data ( from last weeks Monday to Sunday) would be available till the next Saturday.

Solution

To resolve this it would be sufficient to set segment.ms parameter to 24 hours, so that the segments get rolled every day and as and when 1 week passes the old data gets deleted.

Published on Java Code Geeks with permission by Abhijeet Iyengar, partner at our JCG program. See the original article here: Retention period and issue with Kafka data not getting deleted

Opinions expressed by Java Code Geeks contributors are their own.

Abhijeet Iyengar

Abhijeet is a Software Engineer working with financial client . He has been involved in building UI and service based applications.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button