Kafka监控最佳实践
作者:焦振清时间:2018-10-11Confluent监控仪表盘:Oneapm仪表盘:功能可用性 kafka-monitor:consume-service:records-delay-ms-avgconsume-service:consume-error-rateconsume-service:records-consumed-totalconsume-service:...
·
作者:焦振清
时间:2018-10-11
Confluent监控仪表盘:
Oneapm仪表盘:
功能可用性 kafka-monitor:
- consume-service:records-delay-ms-avg
- consume-service:consume-error-rate
- consume-service:records-consumed-total
- consume-service:records-lost-total
- consume-service:records-duplicated-total
- produce-service:records-produced-total
- produce-service:records-produced-rate
- produce-service:produce-error-rate
- produce-service:produce-availability-avg
- zookeeper集群的状态
错误指标:
- kafka.ActiveControllerCount
说明:活跃的Controller数量,一个集群有且只能有一个Controller - kafka.OfflinePartitionsCount
说明:没有 Leader 的 Partition 的数量.,处于这个状态的 Partition是无法读写的 - kafka.UnderReplicatedPartitions
说明:集群中副本处于同步失败或失效状态的分区数,UnderReplicatedPartitions的值持续大于0时,意味着集群中有Broker处于异常状态(负载不均或者资源瓶颈)
注意:集群在执行kafka-reassign-partitions的时候,该值会大于0,属于正常情况
参考:kafka解析之失效副本 - kafka.FailedFetchRequestsPerSec
- kafka.FailedProduceRequestsPerSec
- kafka.BytesRejectedPerSec
流量指标:
- kafka.BytesInPerSec
- kafka.BytesOutPerSec
- kafka.MessagesInPerSec
- kafka.TotalFetchRequestsPerSec
- kafka.TotalProduceRequestsPerSec
容量指标:
- kafka.RequestHandlerAvgIdlePercent
- kafka.NetworkProcessorAvgIdlePercent
延时指标: 暂无
管理工具:
更多推荐
所有评论(0)