表格存储服务 CLOUDTABLE-Doris集群支持的监控指标:FE节点支持的监控指标

时间：2024-10-11 09:03:48

表格存储服务 CLOUDTABLE Doris监控集群

FE节点支持的监控指标

FE节点监控指标如表1所示。

表1 FE节点支持的监控指标
指标名称	显示名称	含义	取值范围	监控周期（原始指标）	命名空间
doris_fe_image_clean_failed	清理历史元数据镜像文件失败的次数	不应失败，如失败，需人工介入	≥0	60s	SYS.CloudTable
doris_fe_image_clean_success	清理历史元数据镜像文件成功的次数	-	≥0	60s	SYS.CloudTable
doris_fe_image_push_success	将元数据镜像文件推送给其他FE节点的成功的次数	-	≥0	60s	SYS.CloudTable
doris_fe_image_write_failed	生成元数据镜像文件失败的次数	不应失败，如失败，需人工介入	≥0	60s	SYS.CloudTable
doris_fe_image_write_success	生成元数据镜像文件成功的次数	-	≥0	60s	SYS.CloudTable
doris_fe_max_journal_id	当前FE节点最大元数据日志ID	如果是Master FE，则是当前写入的最大ID，如果是非Master FE,则代表当前回放的元数据日志最大ID；用于观察多个FE之间的id是否差距过大，过大则表示元数据同步出现问题。	≥0	60s	SYS.CloudTable
doris_fe_max_tablet_compaction_score	所有BE节点中最大的compaction score值	该值可以观测当前集群最大的 compaction score，以判断是否过高，如过高则可能出现查询或写入延迟。	≥0	60s	SYS.CloudTable
doris_fe_qps	当前FE每秒查询数量(仅统计查询请求)	QPS	≥0	60s	SYS.CloudTable
doris_fe_query_err	错误查询的累积值	-	≥0	60s	SYS.CloudTable
doris_fe_query_err_rate	每秒错误查询数	-	≥0	60s	SYS.CloudTable
doris_fe_query_latency_ms_99	查询请求延迟的99分位的查询延迟	-	≥0 ms	60s	SYS.CloudTable
doris_fe_query_latency_ms_999	查询请求延迟的999分位的查询延迟	-	≥0 ms	60s	SYS.CloudTable
doris_fe_query_olap_table	查询内部表(OlapTable)的请求个数	-	≥0	60s	SYS.CloudTable
doris_fe_query_total	所有查询请求数	-	≥0	60s	SYS.CloudTable
doris_fe_report_queue_size	BE的各种定期汇报任务在FE端的队列长度	该值反映了汇报任务在 Master FE 节点上的阻塞程度，数值越大，表示FE处理能力不足。	≥0	60s	SYS.CloudTable
doris_fe_request_total	所有通过MySQL端口接收的操作请求(包括查询和其他语句)	-	≥0	60s	SYS.CloudTable
doris_fe_routine_load_error_rows	集群内所有Routine Load作业的错误行数总和	-	≥0	60s	SYS.CloudTable
doris_fe_routine_load_receive_bytes	集群内所有Routine Load作业接收的数据量大小	-	≥0 Byte	60s	SYS.CloudTable
doris_fe_routine_load_rows	集群内所有Routine Load作业接收的数据行数	-	≥0	60s	SYS.CloudTable
doris_fe_rps	当前FE每秒请求数量(包含查询以及其他各类语句)	和 QPS 配合来查看集群处理请求的量。	≥0	60s	SYS.CloudTable
doris_fe_scheduled_tablet_num	Master FE节点正在调度的tablet数量	包括正在修复的副本和正在均衡的副本；该数值可以反映当前集群,正在迁移的 tablet 数量；如果长时间有值，说明集群不稳定。	≥0	60s	SYS.CloudTable
doris_fe_tablet_status_count_added	Master FE节点被调度过的tablet数量	-	≥0	60s	SYS.CloudTable
doris_fe_tablet_status_count_in_sched	Master FE节点被重复调度的tablet数量	-	≥0	60s	SYS.CloudTable
doris_fe_tablet_status_count_not_ready	Master FE节点未满足调度触发条件的tablet数量	-	≥0	60s	SYS.CloudTable
doris_fe_tablet_status_count_total	Master FE节点的被检查过的tablet数量	-	≥0	60s	SYS.CloudTable
doris_fe_tablet_status_count_unhealthy	Master FE节点累积的被检查过的不健康的tablet数量	-	≥0	60s	SYS.CloudTable
doris_fe_txn_counter_begin	提交的事务数量	-	≥0	60s	SYS.CloudTable
doris_fe_txn_counter_failed	失败的事务数量	-	≥0	60s	SYS.CloudTable
doris_fe_txn_counter_reject	被拒绝的事务数量	如当前运行事务数大于阈值，则新的事务会被拒绝。	≥0	60s	SYS.CloudTable
doris_fe_txn_counter_success	成功的事务数量	-	≥0	60s	SYS.CloudTable
doris_fe_txn_exec_latency_ms_99	99分位的事务执行耗时	-	≥0 ms	60s	SYS.CloudTable
doris_fe_txn_exec_latency_ms_999	999分位的事务执行耗时	-	≥0 ms	60s	SYS.CloudTable
doris_fe_txn_publish_latency_ms_99	99分位的事务publish耗时	-	≥0 ms	60s	SYS.CloudTable
doris_fe_txn_publish_latency_ms_999	999分位的事务publish耗时	-	≥0 ms	60s	SYS.CloudTable
jvm_heap_size_bytes_max	最大堆内存	观测JVM内存使用情况。	≥0 Byte	60s	SYS.CloudTable
jvm_heap_size_bytes_committed	已申请的堆内存	观测JVM内存使用情况。	≥0 Byte	60s	SYS.CloudTable
jvm_heap_size_bytes_used	已使用的堆内存	观测JVM内存使用情况。	≥0 Byte	60s	SYS.CloudTable
jvm_non_heap_size_bytes_committed	已申请的堆外内存	-	≥0 Byte	60s	SYS.CloudTable
jvm_non_heap_size_bytes_used	已使用堆外内存	-	≥0 Byte	60s	SYS.CloudTable
jvm_old_gc_coun	老年代GC次数	观测是否出现长时间的FullGC。	≥0	60s	SYS.CloudTable
jvm_old_gc_time	老年代GC耗时	观测是否出现长时间的FullGC。	≥0 ms	60s	SYS.CloudTable
jvm_old_size_bytes_used	老年代内存占用	-	≥0 Byte	60s	SYS.CloudTable
jvm_old_size_bytes_peak_used	老年代内存占用峰值	-	≥0 Byte	60s	SYS.CloudTable
jvm_old_size_bytes_max	老年代内存最大值	-	≥0 Byte	60s	SYS.CloudTable
jvm_thread_new_count	线程数峰值	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_thread_new_count	new状态的线程数	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_thread_runnable_count	runnable状态的线程数	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_thread_blocked_count	blocked状态的线程数	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_thread_waiting_count	waiting状态的线程数	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_thread_terminated_coun	terminated状态的线程数	观测JVM线程数是否合理。	≥0	60s	SYS.CloudTable
jvm_young_gc_count	新生代GC次数	累计值	≥0	60s	SYS.CloudTable
jvm_young_gc_time	新生代GC耗时	累计值	≥0 ms	60s	SYS.CloudTable
jvm_young_size_bytes_used	新生代内存占用	-	≥0 Byte	60s	SYS.CloudTable
jvm_young_size_bytes_peak_used	新生代内存占用峰值	-	≥0 Byte	60s	SYS.CloudTable
jvm_young_size_bytes_max	新生代内存最大值	-	≥0 Byte	60s	SYS.CloudTable
doris_fe_cache_added_partition	新增的Partition Cache数量	累计值	≥0	60s	SYS.CloudTable
doris_fe_cache_added_sql	新增的SQL Cache数量	累计值	≥0	60s	SYS.CloudTable
doris_fe_cache_hit_partition	命中Partition Cache数	-	≥0	60s	SYS.CloudTable
doris_fe_cache_hit_sql	命中SQL Cache数	-	≥0	60s	SYS.CloudTable
doris_fe_connection_total	当前FE的MySQL端口连接数	用于监控查询连接数。如果连接数超限，则新的连接将无法接入	≥0	60s	SYS.CloudTable
doris_fe_counter_hit_sql_block_rule	被SQL BLOCK RULE拦截的查询数量	-	≥0	60s	SYS.CloudTable
doris_fe_edit_log_clean_failed	清理历史元数据日志失败的次数	不应失败,如失败,需人工介入。	≥0	60s	SYS.CloudTable
doris_fe_edit_log_clean_success	清理历史元数据日志成功的次数	-	≥0	60s	SYS.CloudTable
doris_fe_edit_log_read	元数据日志读取次数的计数	通过斜率观察元数据读取频率是否正常。	≥0	60s	SYS.CloudTable
doris_fe_edit_log_write	元数据日志写入次数的计数	通过斜率观察元数据读取频率是否正常。	≥0	60s	SYS.CloudTable
doris_fe_image_push_failed	将元数据镜像文件推送给其他FE节点的失败的次数	-	≥0	60s	SYS.CloudTable
doris_fe_thrift_rpc_total_{method_name}	doris_fe_thrift_rpc_total_{method_name}	FE thrift接口各个方法接收的RPC请求次数	≥0	60s	Service.CloudTable
doris_fe_thrift_rpc_latency_ms_{method_name}	doris_fe_thrift_rpc_latency_ms_{method_name}	FE thrift接口各个方法接收的RPC请求耗时	≥0	60s	Service.CloudTable
doris_fe_thread_pool_thrift_server_pool_active_thread_num	doris_fe_thread_pool_thrift_server_pool_active_thread_num	线程池thrift-server-pool正在执行的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_thrift_server_pool_active_thread_pct	doris_fe_thread_pool_thrift_server_pool_active_thread_pct	线程池thrift-server-pool正在执行的任务数占最大线程数的百分比	[0%,100%]	60s	Service.CloudTable
doris_fe_thread_pool_thrift_server_pool_task_in_queue	doris_fe_thread_pool_thrift_server_pool_task_in_queue	线程池thrift-server-pool正在排队的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_thrift_server_pool_task_rejected	doris_fe_thread_pool_thrift_server_pool_task_rejected	线程池thrift-server-pool拒绝的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_mysql_nio_pool_active_thread_num	doris_fe_thread_pool_mysql_nio_pool_active_thread_num	线程池mysql-nio-pool正在执行的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_mysql_nio_pool_active_thread_pct	doris_fe_thread_pool_mysql_nio_pool_active_thread_pct	线程池mysql-nio-pool正在执行的任务数占最大线程数的百分比	[0%,100%]	60s	Service.CloudTable
doris_fe_thread_pool_mysql_nio_pool_task_in_queue	doris_fe_thread_pool_mysql_nio_pool_task_in_queue	线程池mysql-nio-pool正在排队的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_mysql_nio_pool_task_rejected	doris_fe_thread_pool_mysql_nio_pool_task_rejected	线程池mysql-nio-pool拒绝的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_connect_scheduler_pool_active_thread_num	doris_fe_thread_pool_connect_scheduler_pool_active_thread_num	线程池connect-scheduler-pool正在执行的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_connect_scheduler_pool_active_thread_pct	doris_fe_thread_pool_connect_scheduler_pool_active_thread_pct	线程池connect-scheduler-pool正在执行的任务数占最大线程数的百分比	[0%,100%]	60s	Service.CloudTable
doris_fe_thread_pool_connect_scheduler_pool_task_in_queue	doris_fe_thread_pool_connect_scheduler_pool_task_in_queue	线程池connect-scheduler-pool正在排队的任务数	≥0	60s	Service.CloudTable
doris_fe_thread_pool_connect_scheduler_pool_task_rejected	doris_fe_thread_pool_connect_scheduler_pool_task_rejected	线程池connect-scheduler-pool拒绝的任务数	≥0	60s	Service.CloudTable