数据仓库服务 GAUSSDB(DWS)-做维表:示例

时间:2024-06-29 17:51:38

示例

从Kafka源表中读取数据,将 GaussDB (DWS)表作为维表,并将二者生成的宽表信息写入print结果表中,其具体步骤如下:

  1. 连接GaussDB(DWS)数据库实例,在GaussDB(DWS)中创建相应的表,作为维表,表名为area_info,SQL语句如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    create table public.area_info(
      area_id VARCHAR,
      area_province_name VARCHAR,
      area_city_name VARCHAR,
      area_county_name VARCHAR,
      area_street_name VARCHAR,
      region_name VARCHAR,
      PRIMARY KEY(area_id)
    );
    
  2. 连接GaussDB(DWS)数据库实例,向GaussDB(DWS)维表area_info中插入测试数据,其语句如下:
    1
    2
    3
    4
    5
    6
    7
    insert into area_info
      (area_id, area_province_name, area_city_name, area_county_name, area_street_name, region_name) 
      values
      ('330102', 'a1', 'b1', 'c1', 'd1', 'e1'),
      ('330106', 'a1', 'b1', 'c2', 'd2', 'e1'),
      ('330108', 'a1', 'b1', 'c3', 'd3', 'e1'),
      ('330110', 'a1', 'b1', 'c4', 'd4', 'e1');
    
  3. flink sql创建源表、结果表、维表并执行SQL:
    CREATE TABLE orders (
      order_id string,
      order_channel string,
      order_time string,
      pay_amount double,
      real_pay double,
      pay_time string,
      user_id string,
      user_name string,
      area_id string,
      proctime as Proctime()
    ) WITH (
      'connector' = 'kafka',
      'topic' = 'order_test',
      'properties.bootstrap.servers' = 'KafkaAddress1:KafkaPort,KafkaAddress2:KafkaPort',
      'properties.group.id' = 'dws-order',
      'scan.startup.mode' = 'latest-offset',
      'format' = 'json'
    );
    --创建地址维表
    create table area_info (
        area_id string, 
        area_province_name string, 
        area_city_name string, 
        area_county_name string,
        area_street_name string, 
        region_name string 
    ) WITH (
      'connector' = 'dws',
      'url' = 'jdbc:gaussdb://DwsAddress:DwsPort/DwsDbName',
      'tableName' = 'area_info',
      'username' = 'DwsUserName',
      'password' = 'DwsPassword',
      'lookupCacheMaxRows' = '10000',
      'lookupCacheExpireAfterAccess' = '2h'
    );
    --根据地址维表生成详细的包含地址的订单信息宽表
    create table order_detail(
        order_id string,
        order_channel string,
        order_time string,
        pay_amount double,
        real_pay double,
        pay_time string,
        user_id string,
        user_name string,
        area_id string,
        area_province_name string,
        area_city_name string,
        area_county_name string,
        area_street_name string,
        region_name string
    ) with (
      'connector' = 'print'
     
    );
    insert into order_detail
        select orders.order_id, orders.order_channel, orders.order_time, orders.pay_amount, orders.real_pay, orders.pay_time, orders.user_id, orders.user_name,
               area.area_id, area.area_province_name, area.area_city_name, area.area_county_name,
               area.area_street_name, area.region_name  from orders
        left join area_info for system_time as of orders.proctime as area on orders.area_id = area.area_id;
  4. 在Kafka中写入数据:
    1
    2
    3
    {"order_id":"202103241606060001", "order_channel":"appShop", "order_time":"2021-03-24 16:06:06", "pay_amount":"200.00", "real_pay":"180.00", "pay_time":"2021-03-24 16:10:06", "user_id":"0001", "user_name":"Alice", "area_id":"330106"}
    {"order_id":"202103251202020001", "order_channel":"miniAppShop", "order_time":"2021-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2021-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330110"}
    {"order_id":"202103251505050001", "order_channel":"qqShop", "order_time":"2021-03-25 15:05:05", "pay_amount":"500.00", "real_pay":"400.00", "pay_time":"2021-03-25 15:10:00", "user_id":"0003", "user_name":"Cindy", "area_id":"330108"}
    
  5. 结果参考如下:

support.huaweicloud.com/tg-dws/dws_07_0185.html