Menu
Back to Discussions

Kafka consumer lag: monitoring and alerting strategies

Chiara Taylor
Chiara Taylor
·361 views
we've had a few incidents where kafka consumer lag grew into the millions of messages without us noticing until it was too late. our initial alerting strategy of setting static thresholds for lag proved problematic: it either generated too many false positives during expected batch processing windows, or it was too high to catch issues early. what are people's best practices for monitoring and alerting on kafka consumer lag? should we be looking at the rate of change of lag rather than the absolute lag? or perhaps time-based lag (how far behind in time) rather than message count? we also have some consumers that are intentionally batched, so their lag will naturally fluctuate wildly. how do you distinguish between 'normal' lag and 'problematic' lag in a heterogeneous consumer environment?
0 comments

Comments

Sign in to join the conversation.

Loading comments...