🔏
Tech
  • 🟢App aspects
    • Software architecture
      • Caching
      • Anti-patterns
      • System X-ability
      • Coupling
      • Event driven architecture
        • Command Query Responsibility Segregation (CQRS)
        • Change Data Capture (CDC)
      • Distributed transactions
      • App dev notes
        • Architecture MVP
      • TEMP. Check list
      • Hexagonal arch
      • Communication
        • REST vs messaging
        • gRPC
        • WebSocket
      • Load balancers
      • Storage limits
      • Event storming
    • Authentication
    • Deployment strategy
  • Databases
    • Classification
    • DB migration tools
    • PostreSQL
    • Decision guidance
    • Index
      • Hash indexes
      • SSTable, LSM-Trees
      • B-Tree
      • Engines, internals
    • Performance
  • System design
    • Interview preparation
      • Plan
        • Instagram
        • Tinder
        • Digital wallet
        • Dropbox
        • Live video streaming
        • Uber
        • Whatsup
        • Tiktok
        • Twitter
        • Proximity service
    • Algorithms
    • Acronyms
  • 🟢Programming languages
    • Java
      • Features
        • Field hiding
        • HashCode() and Equals()
        • Reference types
        • Pass by value
        • Atomic variables
      • Types
      • IO / NIO
        • Java NIO
          • Buffer
          • Channel
        • Java IO: Streams
          • Input streams
            • BufferedInputStream
            • DataInputStream
            • ObjectInputStream
            • FilterInputStream
            • ByteArrayInputStream
        • Java IO: Pipes
        • Java IO: Byte & Char Arrays
        • Java IO: Input Parsing
          • PushbackReader
          • StreamTokenizer
          • LineNumberReader
          • PushbackInputStream
        • System.in, System.out, System.error
        • Java IO: Files
          • FileReader
          • FileWriter
          • FileOutputStream
          • FileInputStream
      • Multithreading
        • Thread liveness
        • False sharing
        • Actor model
        • Singleton
        • Future, CompletableFuture
        • Semaphore
      • Coursera: parallel programming
      • Coursera: concurrent programming
      • Serialization
      • JVM internals
      • Features track
        • Java 8
      • Distributed programming
      • Network
      • Patterns
        • Command
      • Garbage Collectors
        • GC Types
        • How GC works
        • Tools for GC
    • Kotlin
      • Scope functions
      • Inline value classes
      • Coroutines
      • Effective Kotlin
    • Javascript
      • Javascript vs Java
      • TypeScript
    • SQL
      • select for update
    • Python
      • __init.py__
  • OS components
    • Network
      • TCP/IP model
        • IP address in action
      • OSI model
  • 🟢Specifications
    • JAX-RS
    • REST
      • Multi part
  • 🟢Protocols
    • HTTP
    • OAuth 2.0
    • LDAP
    • SAML
  • 🟢Testing
    • Selenium anatomy
    • Testcafe
  • 🟢Tools
    • JDBC
      • Connection pool
    • Gradle
    • vim
    • git
    • IntelliJ Idea
    • Elastic search
    • Docker
    • Terraform
    • CDK
    • Argo CD
      • app-of-app setup
    • OpenTelemetry
    • Prometheus
    • Kafka
      • Consumer lag
  • 🟢CI
    • CircleCi
  • 🟢Platforms
    • AWS
      • VPC
      • EC2
      • RDS
      • S3
      • IAM
      • CloudWatch
      • CloudTrail
      • ELB
      • SNS
      • Route 53
      • CloudFront
      • Athena
      • EKS
    • Kubernetes
      • Networking
      • RBAC
      • Architecture
      • Pod
        • Resources
      • How to try
      • Kubectl
      • Service
      • Tooling
        • ArgoCD
        • Helm
        • Istio
    • GraalVM
    • Node.js
    • Camunda
      • Service tasks
      • Transactions
      • Performance
      • How it executes
  • 🟢Frameworks
    • Hibernate
      • JPA vs Spring Data
    • Micronaut
    • Spring
      • Security
      • JDBC, JPA, Hibernate
      • Transactions
      • Servlet containers, clients
  • 🟢Awesome
    • Нейробиология
    • Backend
      • System design
    • DevOps
    • Data
    • AI
    • Frontend
    • Mobile
    • Testing
    • Mac
    • Books & courses
      • Path: Java Concurrency
    • Algorithms
      • Competitive programming
    • Processes
    • Finance
    • Electronics
  • 🟢Electronics
    • Arduino
    • IoT
  • Artificial intelligence
    • Artificial Intelligence (AI)
  • 🚀Performance
    • BE
  • 📘Computer science
    • Data structures
      • Array
      • String
      • LinkedList
      • Tree
    • Algorithms
      • HowTo algorithms for interview
  • 🕸️Web dev (Frontend)
    • Trends
    • Web (to change)
  • 📈Data science
    • Time series
Powered by GitBook
On this page
  • Reasons
  • How to check
  • How to monitor
  • Strategies for addressing Kafka consumer lag
  • Optimizing consumer processing logic
  • Modifying partition count
  • Using an application-specific queue to limit rates
  • Tuning consumer configuration parameters

Was this helpful?

  1. Tools
  2. Kafka

Consumer lag

PreviousKafkaNextCircleCi

Last updated 8 months ago

Was this helpful?

Kafka consumer lag is the difference between the last offset stored by the broker and the last committed offset for that partition.

To enable resilience, Kafka consumers keep track of the last position in a partition from where it is read. This helps consumers to begin again from the position they left off in case of unfortunate situations like crashes. This is called consumer offset. Consumer offset is stored in a separate Kafka topic.

Reasons

Incoming traffic increase

Data not equally balanced in partitions (some partitions are heavier)

Slow processing jobs

Errors in code and pipeline components

How to check

$KAFKA_HOME/bin/kafka-consumer-groups.sh  --bootstrap-server <> --describe --group <group_name>
GROUP          TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG    OWNER
ub-kf          test-topic      0          15              17              2      ub-kf-1/127.0.0.1  
ub-kf          test-topic      1          14              15              1         ub-kf-2/127.0.0.1

It details each partition’s ‘current offset’, ‘log end offset’, and lag. The ‘current offset’ of a partition is the last committed message of the consumer dealing with that partition. The ‘log end offset' is the highest offset in that partition. In other words, it represents the offset of the last message written to that partition. The difference between these two is the amount of consumer lag.

How to monitor

Tool

Strategies for addressing Kafka consumer lag

Optimizing consumer processing logic

The most straightforward solution to solve the consumer lag problem is to identify the bottleneck in the processing logic and make it more efficient. Of course, this is easier said than done. A possible approach is to identify any synchronous blocking operation in the code and make it asynchronous if the use case allows it.

Modifying partition count

Partition assignment is a crucial factor that affects consumer lag. A situation where only one partition’s lag is higher than others points to a problem in the data distribution. This can be solved using a custom partition key or adding more partitions. The configuration of partitions and consumers will depend on the use case.

Using an application-specific queue to limit rates

Sometimes the processing logic is a time-consuming operation and can't be optimized. In these cases, developers should consider using an application-specific queue so that consumers can just add messages to the application queue. Once the messages are pushed to the queue, the consumer can wait till they are processed before accepting new messages from Kafka. The second queue will help to divide the messages from a single partition among multiple processing instances.

Tuning consumer configuration parameters

Developers can use various consumer parameters to adjust how frequently consumers pull messages from partitions or how much data is fetched in one operation. The parameter ‘fetch.max.wait.ms’ controls the time the consumer waits before accepting new data from the partition. The ‘fetch.max.bytes’ and ‘fetch.min.bytes’ parameters control the maximum and minimum amount of data fetched by consumers in one operation.

Another critical element of reducing consumer lag is to minimize rebalancing operations as much as possible. Each rebalance operation blocks the consumers for some time and can increase the consumer lag. The parameters ‘max.poll.interval.ms’, ‘session.timeout.ms’, and ‘heartbeat.interval.ms’ control the situations where the broker decides that a consumer has been inactive for too long and triggers a rebalance operation. Tuning these values can help in reducing rebalance operations.

🟢
Burrow