云服务器内容精选

  • 规则 有数据持续写入的表,24小时内至少执行一次compaction。 对于MOR表,不管是流式写入还是批量写入,需要保证每天至少完成1次Compaction操作。如果长时间不做compaction,Hudi表的log将会越来越大,这必将会出现以下问题: Hudi表读取很慢,且需要很大的资源。 这是由于读MOR表涉及到log合并,大log合并需要消耗大量的资源并且速度很慢。 长时间进行一次Compaction需要耗费很多资源才能完成,且容易出现OOM。 阻塞Clean,如果没有Compaction操作来产生新版本的Parquet文件,那旧版本的文件就不能被Clean清理,增加存储压力。 CPU与内存比例为1:4~1:8。 Compaction作业是将存量的parquet文件内的数据与新增的log中的数据进行合并,需要消耗较高的内存资源,按照之前的表设计规范以及实际流量的波动结合考虑,建议Compaction作业CPU与内存的比例按照1:4~1:8配置,保证Compaction作业稳定运行。当Compaction出现OOM问题,可以通过调大内存占比解决。
  • 请求示例 POST https://{endpoint} /v1/{project_id}/instances/{instance_id}/catalogs/{catalog_name}/databases/{database_name}/tables { "table_name" : "tbl86b03ad314fa4ea7943fb769df163a79", "table_type" : "MANAGED_TABLE", "owner" : "onebox", "owner_type" : "USER", "create_time" : "2023-05-31T01:59:54.000+00:00", "last_access_time" : "2023-05-31T01:59:54.000+00:00", "last_analyzed_time" : "2023-05-31T01:59:54.000+00:00", "partition_keys" : [ { "column_type" : "string", "column_name" : "column_prefix0", "comment" : "5456ac36c75947eab223476dafa58ae0" }, { "column_type" : "string", "column_name" : "column_prefix1", "comment" : "d95d1241557e4a769bfcda42974fdf0e" } ], "retention" : 1000, "storage_descriptor" : { "columns" : [ { "column_type" : "string", "column_name" : "column_prefix0", "comment" : "3eb10da96ec84ec6b8543b730c0660b2" }, { "column_type" : "string", "column_name" : "column_prefix1", "comment" : "ee5aa1f3755f425a964f045f59b2c0a0" }, { "column_type" : "string", "column_name" : "column_prefix2", "comment" : "d47884f69582432db1256e60a8e37ea6" }, { "column_type" : "string", "column_name" : "column_prefix3", "comment" : "4e26e910310648f1a97b6e17cb2b25ce" }, { "column_type" : "string", "column_name" : "column_prefix4", "comment" : "85084620943b4d36a9e5d6a856307e73" }, { "column_type" : "string", "column_name" : "column_prefix5", "comment" : "f78264b5d8ab4409a3a05ebb0434424e" }, { "column_type" : "string", "column_name" : "column_prefix6", "comment" : "45df8ef109374ac6ab9aaa371b29baa7" }, { "column_type" : "string", "column_name" : "column_prefix7", "comment" : "c006d18cc411429bacc0f0e1eb75f99e" }, { "column_type" : "string", "column_name" : "column_prefix8", "comment" : "58e1c81b2c0a4c089570106c8bb4461d" }, { "column_type" : "string", "column_name" : "column_prefix9", "comment" : "32cc797a16f348ebbf5f2e1f85969603" } ], "location" : "obs://location/test/database/5e2941387d8741afa217677361d0676f", "compressed" : false, "input_format" : "850fd4c238934dfdaa3006f388d71f9d", "output_format" : "740f42961f1146d39abe19beadc2ad89", "number_of_buckets" : 0, "bucket_columns" : [ ], "sort_columns" : [ ], "serde_info" : { "name" : "a247cc09ce74477e8c95915bb45e9878", "serialization_library" : "9dad2d1dfe944e1d83747997188ed445", "parameters" : { "347f5f03fdaf46489a39e308be3d2160" : "317b872caa5d4de7bed96f644028ed83" } }, "parameters" : { "ce43c3c8b9854b749dddabe76c487dd8" : "67393e5c0dc649479399603f6adf1432", "8805af26461e49feaf5fb67eef8da574" : "f956a8ac29514f57af9da84043b845f0" }, "skewed_info" : { "skewed_column_names" : [ ], "skewed_column_value_location_maps" : { }, "skewed_column_values" : [ ] }, "stored_as_sub_directories" : false }, "parameters" : { "key1" : "value1", "transient_lastDdlTime" : "120", "classification" : "other" }, "comments" : "comment info" }
  • 响应示例 状态码: 201 Created { "catalog_name" : "catcd6359b2a15f42dc90fca897c15db00a", "database_name" : "dbbf565489e3d04b2d8ca03d6e3d48486f", "table_name" : "tbl1298911dffa94c449a900ec5dfc50cf8", "create_time" : "2023-05-31T02:03:44.016+00:00", "last_access_time" : "2023-05-31T02:03:43.000+00:00", "update_time" : "2023-05-31T02:03:44.016+00:00", "last_analyzed_time" : "2023-05-31T02:03:43.000+00:00", "owner" : "onebox", "owner_type" : "USER", "parameters" : { "key1" : "value1", "transient_lastDdlTime" : "120", "classification" : "other" }, "partition_keys" : [ { "column_type" : "string", "column_name" : "column_prefix0", "comment" : "8be09e713e6b46a08b375b662f93e195" }, { "column_type" : "string", "column_name" : "column_prefix1", "comment" : "6874d3047a6c494696438fc64d9a0194" } ], "retention" : 1000, "storage_descriptor" : { "columns" : [ { "column_type" : "string", "column_name" : "column_prefix0", "comment" : "5a2968f6141c4d40969c935c39b5accc" }, { "column_type" : "string", "column_name" : "column_prefix1", "comment" : "39ac0d5c26ce47d8948a5ea49a455742" }, { "column_type" : "string", "column_name" : "column_prefix2", "comment" : "a8d51537564a4042ab3d88a0acfccf83" }, { "column_type" : "string", "column_name" : "column_prefix3", "comment" : "d541dd8c95bf4d49853ae5563869821e" }, { "column_type" : "string", "column_name" : "column_prefix4", "comment" : "9955f7831b3b4bdc8a18c2e20b592976" }, { "column_type" : "string", "column_name" : "column_prefix5", "comment" : "32115b31e61245fd94162f566c6db966" }, { "column_type" : "string", "column_name" : "column_prefix6", "comment" : "a6d61e1ad9e14a6da94e242607369284" }, { "column_type" : "string", "column_name" : "column_prefix7", "comment" : "9dd785a2f6744fa2a8c5a64d78147961" }, { "column_type" : "string", "column_name" : "column_prefix8", "comment" : "60128f0062a949559f0c95f2656812ae" }, { "column_type" : "string", "column_name" : "column_prefix9", "comment" : "99d4d3e57af64bd1bb287b5a305b7225" } ], "location" : "obs://location/test/database/0b288717b41a411dacdd0c9fa2e1c275", "compressed" : false, "input_format" : "c28e802613a342568a667aeba4961979", "output_format" : "9bf425a38fd540c69b68366071b8bbd5", "number_of_buckets" : 0, "bucket_columns" : [ ], "sort_columns" : [ ], "serde_info" : { "name" : "e044a1458d8c4b7a871e54243d2adb93", "serialization_library" : "bb299f2201fe4da0886ecd380a1508a3", "parameters" : { "2980009b8dfd4b3bb78a96485a3728c2" : "f83395c8db5e4ac1803892b08c209799" } }, "parameters" : { "64f073b0a9bd49cb8437681f3faa1538" : "f4154f400e6e465cbc8b8b2033a24faf", "f31bf1375d4d4097b68e9816388e50a8" : "881546d1b0184048b03ad20d1218104d" }, "skewed_info" : { "skewed_column_names" : [ ], "skewed_column_value_location_maps" : { }, "skewed_column_values" : [ ] }, "stored_as_sub_directories" : false }, "table_type" : "EXTERNAL_TABLE", "comments" : "comment info" } 状态码: 400 Bad Request { "error_code" : "common.01000001", "error_msg" : "failed to read http request, please check your input, code: 400, reason: Type mismatch., cause: TypeMismatchException" } 状态码: 401 Unauthorized { "error_code": 'APIG.1002', "error_msg": 'Incorrect token or token resolution failed' } 状态码: 403 Forbidden { "error" : { "code" : "403", "message" : "X-Auth-Token is invalid in the request", "error_code" : null, "error_msg" : null, "title" : "Forbidden" }, "error_code" : "403", "error_msg" : "X-Auth-Token is invalid in the request", "title" : "Forbidden" } 状态码: 404 Not Found { "error_code" : "common.01000001", "error_msg" : "response status exception, code: 404" } 状态码: 408 Request Timeout { "error_code" : "common.00000408", "error_msg" : "timeout exception occurred" } 状态码: 500 Internal Server Error { "error_code" : "common.00000500", "error_msg" : "internal error" }
  • URI POST /v1/{project_id}/instances/{instance_id}/catalogs/{catalog_name}/databases/{database_name}/tables 表1 路径参数 参数 是否必选 参数类型 描述 project_id 是 String 项目编号。获取方法,请参见获取项目ID。 instance_id 是 String LakeFormation实例ID。创建实例时自动生成。例如:2180518f-42b8-4947-b20b-adfc53981a25。 catalog_name 是 String catalog名称。只能包含字母、数字和下划线,且长度为1~256个字符。 database_name 是 String 数据库名称。只能包含中文、字母、数字、下划线、中划线,且长度为1~128个字符。
  • 规则 有数据持续写入的表,24小时内至少执行一次compaction。 对于MOR表,不管是流式写入还是批量写入,需要保证每天至少完成1次Compaction操作。如果长时间不做compaction,Hudi表的log将会越来越大,这必将会出现以下问题: Hudi表读取很慢,且需要很大的资源。 这是由于读MOR表涉及到log合并,大log合并需要消耗大量的资源并且速度很慢。 长时间进行一次Compaction需要耗费很多资源才能完成,且容易出现OOM。 阻塞Clean,如果没有Compaction操作来产生新版本的Parquet文件,那旧版本的文件就不能被Clean清理,增加存储压力。 CPU与内存比例为1:4~1:8。 Compaction作业是将存量的parquet文件内的数据与新增的log中的数据进行合并,需要消耗较高的内存资源,按照之前的表设计规范以及实际流量的波动结合考虑,建议Compaction作业CPU与内存的比例按照1:4~1:8配置,保证Compaction作业稳定运行。当Compaction出现OOM问题,可以通过调大内存占比解决。 【建议】通过增加并发数提升Compaction性能。