云容器引擎 CCE-CCE节点故障检测:Prometheus指标采集

时间:2024-05-31 08:37:35

Prometheus指标采集

NPD 守护进程POD通过端口19901暴露Prometheus metrics指标,NPD Pod默认被注释metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"prometheus","path":"/metrics","port":"19901","names":""}]'。您可以自建Prometheus采集器识别并通过http://{{NpdPodIP}}:{{NpdPodPort}}/metrics路径获取NPD指标。

NPD插件为1.16.5版本以下时,Prometheus指标的暴露端口为20257。

目前指标信息包含异常状态计数problem_counter与异常状态problem_gauge,如下所示

# HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
problem_counter{reason="DockerHung"} 0
problem_counter{reason="DockerStart"} 0
problem_counter{reason="EmptyDirVolumeGroupStatusError"} 0
...
# HELP problem_gauge Whether a specific type of problem is affecting the node or not.
# TYPE problem_gauge gauge
problem_gauge{reason="CNIIsDown",type="CNIProblem"} 0
problem_gauge{reason="CNIIsUp",type="CNIProblem"} 0
problem_gauge{reason="CRIIsDown",type="CRIProblem"} 0
problem_gauge{reason="CRIIsUp",type="CRIProblem"} 0
..
support.huaweicloud.com/usermanual-cce/cce_10_0132.html