华为云首页用户手册

MapReduce服务 MRS-创建FlinkServer作业写入数据至Hudi表:Flink On Hudi同步元数据到Hive

时间：2025-01-26 10:49:31

MapReduce服务 MRS

Flink On Hudi同步元数据到Hive

启动此特性后，Flink写数据至Hudi表将自动在Hive上创建出Hudi表并同步添加分区，然后供SparkSQL、Hive等服务读取Hudi表数据。

如下是支持的两种同步元数据方式，后续操作步骤以JDBC方式为示例：

适用于 MRS 3.2.0及之后版本。

使用JDBC方式同步元数据到Hive
```
CREATE TABLE stream_mor(uuid VARCHAR(20),name VARCHAR(10),age INT,ts INT,`p` VARCHAR(20)) PARTITIONED BY (`p`) WITH ('connector' = 'hudi','path' = 'hdfs://hacluster/tmp/hudi/stream_mor','table.type' = 'MERGE_ON_READ','hive_sync.enable' = 'true','hive_sync.table' = '要同步到Hive的表名','hive_sync.db' = '要同步到Hive的数据库名','hive_sync.metastore.uris' = 'Hive客户端hive-site.xml文件中hive.metastore.uris的值','hive_sync.jdbc_url' = 'Hive客户端component_env文件中CLIENT_HIVE_URI的值');
```
- hive_sync.jdbc_url：Hive客户端component_env文件中CLIENT_HIVE_URI的值，如果该值中存在“\”需将其删除。
- 如果需要使用Hive风格分区，需同时配置如下参数：
  - 'hoodie.datasource.write.hive_style_partitioning' = 'true'
  - 'hive_sync.partition_extractor_class' = 'org.apache.hudi.hive.MultiPartKeysValueExtractor'
- Flink on Hudi并同步数据至Hive的任务，因为Hudi对大小写敏感，Hive对大小写不敏感，所以在Hudi表中的字段不建议使用大写字母，否则可能会造成数据无法正常读写。

使用HMS方式同步元数据到Hive

CREATE TABLE stream_mor(uuid VARCHAR(20),name VARCHAR(10),age INT,ts INT,`p` VARCHAR(20)) PARTITIONED BY (`p`) WITH ('connector' = 'hudi','path' = 'hdfs://hacluster/tmp/hudi/stream_mor','table.type' = 'MERGE_ON_READ','hive_sync.enable' = 'true','hive_sync.table' = '要同步到Hive的表名','hive_sync.db' = '要同步到Hive的数据库名','hive_sync.mode' = 'hms','hive_sync.metastore.uris' = 'Hive客户端hive-site.xml文件中hive.metastore.uris的值','properties.hive.metastore.kerberos.principal' = 'Hive客户端hive-site.xml文件中hive.metastore.kerberos.principal的值');