云容器引擎 CCE-CCE节点故障检测:Prometheus指标采集
Prometheus指标采集
NPD 守护进程POD通过端口19901暴露Prometheus metrics指标,NPD Pod默认被注释metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"prometheus","path":"/metrics","port":"19901","names":""}]'。您可以自建Prometheus采集器识别并通过http://{{NpdPodIP}}:{{NpdPodPort}}/metrics路径获取NPD指标。
NPD插件为1.16.5版本以下时,Prometheus指标的暴露端口为20257。
目前指标信息包含异常状态计数problem_counter与异常状态problem_gauge,如下所示
# HELP problem_counter Number of times a specific type of problem have occurred. # TYPE problem_counter counter problem_counter{reason="DockerHung"} 0 problem_counter{reason="DockerStart"} 0 problem_counter{reason="EmptyDirVolumeGroupStatusError"} 0 ... # HELP problem_gauge Whether a specific type of problem is affecting the node or not. # TYPE problem_gauge gauge problem_gauge{reason="CNIIsDown",type="CNIProblem"} 0 problem_gauge{reason="CNIIsUp",type="CNIProblem"} 0 problem_gauge{reason="CRIIsDown",type="CRIProblem"} 0 problem_gauge{reason="CRIIsUp",type="CRIProblem"} 0 ..