数据湖探索 DLI-Doris维表:示例

时间:2024-11-16 13:21:44

示例

该示例是从Doris源表读取数据,并输入到 print connector。

  1. 参考增强型跨源连接,在 DLI 上根据Doris所在的虚拟私有云和子网分别创建相应的增强型跨源连接,并绑定所要使用的Flink弹性资源池。参考“修改主机信息”章节描述,在增强型跨源中增加 MRS 的主机信息。
  2. 设置Doris和kafka的安全组,添加入向规则使其对Flink的队列网段放通。参考测试地址连通性分别根据Doris和Kafka的地址测试队列连通性。如果能连通,则表示跨源已经绑定成功,否则表示未成功。
  3. 参考MRS Doris使用指南,创建doris表,并插入10条数据。创建语句如下:
    CREATE TABLE IF NOT EXISTS dorisdemo
    (
      `user_id` varchar(10) NOT NULL,
      `city` varchar(10),
      `age` int,
      `gender` int
    )
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 10;
    
    INSERT INTO dorisdemo VALUES ('user1', 'city1', 20, 1);
    INSERT INTO dorisdemo VALUES ('user2', 'city2', 21, 0);
    INSERT INTO dorisdemo VALUES ('user3', 'city3', 22, 1);
    INSERT INTO dorisdemo VALUES ('user4', 'city4', 23, 0);
    INSERT INTO dorisdemo VALUES ('user5', 'city5', 24, 1);
    INSERT INTO dorisdemo VALUES ('user6', 'city6', 25, 0);
    INSERT INTO dorisdemo VALUES ('user7', 'city7', 26, 1);
    INSERT INTO dorisdemo VALUES ('user8', 'city8', 27, 0);
    INSERT INTO dorisdemo VALUES ('user9', 'city9', 28, 1);
    INSERT INTO dorisdemo VALUES ('user10', 'city10', 29, 0);
  4. 参考创建Flink OpenSource作业,创建flink opensource sql作业,输入以下作业脚本,并提交运行。该作业模拟从kafka读取数据,并关联doris维表对数据进行打宽,并输出到print。
    CREATE TABLE ordersSource (
      user_id string,
      user_name string,
      proctime as Proctime()
    ) WITH (
      'connector' = 'kafka',
      'topic' = 'kafka-topic',
      'properties.bootstrap.servers' = 'kafkaIp:port,kafkaIp:port,kafkaIp:port',
      'properties.group.id' = 'GroupId',
      'scan.startup.mode' = 'latest-offset',
      'format' = 'json'
    );
    
    CREATE TABLE dorisDemo (
      `user_id` String NOT NULL,
      `city` String,
      `age` int,
      `gender` int
    ) with (
      'connector' = 'doris',
      'fenodes' = 'FE实例IP地址:端口号',
      'table.identifier' = 'demo.dorisdemo',
      'username' = 'dorisUsername',
      'password' = 'dorisPassword',
      'lookup.cache.ttl'='10 m',
      'lookup.cache.max-rows' = '100'
    );
    
    CREATE TABLE print (
      user_id string,
      user_name string,
      `city` String,
      `age` int,
      `gender` int
    ) WITH (
      'connector' = 'print'
    );
    
    insert into print 
    select 
      orders.user_id,
      orders.user_name,
      dim.city,
      dim.age,
      dim.sex
    from ordersSource orders
    left join dorisDemo for system_time as of orders.proctime as dim on orders.user_id = dim.user_id;
  5. 往kafka数据源写入2条数据。
    {"user_id": "user1", "user_name": "name1"}
    {"user_id": "user2", "user_name": "name2"}
  6. 查看print结果表数据。
    +I[user1, name1, city1, 20, 1]
    +I[user2, name2, city2, 21, 0]
support.huaweicloud.com/sqlref-flink-dli/dli_08_15036.html