Hudi DML语法说明-华为云

MAPREDUCE服务 MRS-INSERT INTO:示例

示例 insert into h0 select 1, 'a1', 20; -- insert static partition insert into h_p0 partition(dt = '2021-01-02') select 1, 'a1'; -- insert dynamic partition insert into h_p0 select 1, 'a1', dt; -- insert dynamic partition insert into h_p1 select 1 as id, 'a1', '2021-01-03' as dt, '19' as hh; -- insert overwrite table insert overwrite table h0 select 1, 'a1', 20; -- insert overwrite table with static partition insert overwrite h_p0 partition(dt = '2021-01-02') select 1, 'a1'; -- insert overwrite table with dynamic partition insert overwrite table h_p1 select 2 as id, 'a2', '2021-01-03' as dt, '19' as hh;

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-INSERT INTO:注意事项

注意事项写入模式：Hudi对于设置了主键的表支持三种写入模式，用户可以设置参数hoodie.sql.insert.mode来指定Insert模式，默认为upsert。 strict模式，Insert语句将保留COW表的主键唯一性约束，不允许重复记录。如果在插入过程中已经存在记录，则会为COW表执行HoodieDuplicateKeyException；对于MOR表，该模式与upsert模式行为一致。 non-strict模式，对主键表采用insert处理。 upsert模式，对于主键表的重复值进行更新操作。在执行spark-sql时，用户可以设置“hoodie.sql.bulk.insert.enable = true”和“hoodie.sql.insert.mode = non-strict”来开启bulk insert作为Insert语句的写入方式。也可以通过直接设置hoodie.datasource.write.operation的方式控制insert语句的写入方式，包括bulk_insert、insert、upsert。使用这种方式控制hoodie写入，需要注意执行完SQL后，必须执行reset hoodie.datasource.write.operation;重置Hudi的写入方式，否则该参数会影响其他SQL的执行。

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-SET/RESET:注意事项

注意事项以下为分别使用SET和RESET命令进行动态设置或清除操作的属性：表2 属性描述属性描述 hoodie.insert.shuffle.parallelism insert方式写入数据时的spark shuffle并行度。 hoodie.upsert.shuffle.parallelism upsert方式写入数据时的spark shuffle并行度。 hoodie.delete.shuffle.parallelism delete方式删除数据时的spark shuffle并行度。 hoodie.sql.insert.mode 指定Insert模式，取值为strict、non-strict及upsert。 hoodie.sql.bulk.insert.enable 指定是否开启bulk insert写入。 spark.sql.hive.convertMetastoreParquet sparksql把parquet表转化为datasource表进行读取。当hudi的provider为hive的情况下，使用sparksql或sparkbeeline进行读取，需要将该参数设置为false。

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-SET/RESET:命令格式

命令格式 Add或Update参数值： SET parameter_name=parameter_value 此命令用于添加或更新“parameter_name”的值。 Display参数值： SET parameter_name 此命令用于显示指定的“parameter_name”的值。 Display会话参数： SET 此命令显示所有支持的会话参数。 Display会话参数以及使用细节： SET -v 此命令显示所有支持的会话参数及其使用细节。 Reset参数值： RESET 此命令清除所有会话参数。

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-DELETE:示例

示例示例1： delete from h0 where column1 = 'country'; 示例2： delete from h0 where column1 IN ('country1', 'country2'); 示例3： delete from h0 where column1 IN (select column11 from sourceTable2); 示例4： delete from h0 where column1 IN (select column11 from sourceTable2 where column1 = 'xxx'); 示例5： delete from h0;

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-MERGE INTO:参数描述

参数描述表1 UPDATE参数参数描述 tableIdentifier 在其中执行MergeInto操作的Hudi表的名称。 target_alias 目标表的别名。 sub_query 子查询。 source_alias 源表或源表达式的别名。 merge_condition 将源表或表达式和目标表关联起来的条件 condition 过滤条件，可选。 matched_action 当满足条件时进行Delete或Update操作 not_matched_action 当不满足条件时进行Insert操作

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-MERGE INTO:示例

示例部分字段更新 create table h0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb'); create table s0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb'); insert into h0 values(1, 1, 1, 1); insert into s0 values(1, 1, 1, 1); insert into s0 values(2, 2, 2, 2); //写法1 merge into h0 using s0 on h0.id = s0.id when matched then update set h0.id = s0.id, h0.comb = s0.comb, price = s0.price * 2; //写法2 merge into h0 using s0 on h0.id = s0.id when matched then update set id = s0.id, name = h0.name, comb = s0.comb + h0.comb, price = s0.price + h0.price; 缺省字段更新和插入 create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb'); create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb'); insert into h0 values(1, 1, 1, 1, false); insert into s0 values(1, 2, 1, 1, true); insert into s0 values(2, 2, 2, 2, false); merge into h0 as target using ( select id, comb, name, price, flag from s0 ) source on target.id = source.id when matched then update set * when not matched then insert *; 多条件更新和删除 create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb'); create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb'); insert into h0 values(1, 1, 1, 1, false); insert into h0 values(2, 2, 1, 1, false); insert into s0 values(1, 1, 1, 1, true); insert into s0 values(2, 2, 2, 2, false); insert into s0 values(3, 3, 3, 3, false); merge into h0 using ( select id, comb, name, price, flag from s0 ) source on h0.id = source.id when matched and flag = false then update set id = source.id, comb = h0.comb + source.comb, price = source.price * 2 when matched and flag = true then delete when not matched then insert *;

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-CLEANARCHIVE:命令格式

命令格式 set hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_SIZE; set hoodie.archive.file.cleaner.size.retained = 5368709120; run cleanarchive on tableIdentifier/tablelocation; set hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_DAYS; set hoodie.archive.file.cleaner.days.retained = 30; run cleanarchive on tableIdentifier/tablelocation;

MAPREDUCE服务 MRS Hudi DML语法说明

MAPREDUCE服务 MRS-CLEANARCHIVE:参数描述

参数描述表1 参数描述参数描述 tableIdentifier Hudi表的名称。 tablelocation Hudi表的存储路径。 hoodie.archive.file.cleaner.policy 清理归档文件的策略：目前仅支持KEEP_ARCHIVED_FILES_BY_SIZE和KEEP_ARCHIVED_FILES_BY_DAYS两种策略，默认策略为KEEP_ARCHIVED_FILES_BY_DAYS。 KEEP_ARCHIVED_FILES_BY_SIZE策略可以设置归档文件占用的存储空间大小 KEEP_ARCHIVED_FILES_BY_DAYS策略可以清理超过某个时间点之外的归档文件 hoodie.archive.file.cleaner.size.retained 当清理策略为KEEP_ARCHIVED_FILES_BY_SIZE时，该参数可以设置保留多少字节大小的归档文件，默认值5368709120字节（5G）。 hoodie.archive.file.cleaner.days.retained 当清理策略为KEEP_ARCHIVED_FILES_BY_DAYS时，该参数可以设置保留多少天以内的归档文件，默认值30（天）。

MAPREDUCE服务 MRS Hudi DML语法说明

云服务器内容精选

Hudi DML语法说明

7*24

备案

专业服务

退订

建议反馈

售前咨询热线