MAPREDUCE服务 MRS-Flink SQL逻辑开发建议:多表left join场景下关联键发生改变使用雪花模型代替星型模型

时间:2024-09-06 10:03:29

多表left join场景下关联键发生改变使用雪花模型代替星型模型

多表left join关联键发生更新时会发生数据乱序,建议右表先关联成一个view,然后再与左表关联。

关联键group_id改变导致“-D”和“+I”乱序,下游根据user_id哈希时虽然进入同一并行度,但是“+I”消息先到,“-D”消息后到,最终写入宽表时记录就会被删除。

  • 优化前SQL:
    select... 
    from t1
    left join t2 on t2.user_id = t1.user_id 
    left join t10 on t10.user_id = t1.user_id 
    left join t11 on t11.group_id = t10.group_id
    left join t12 on t12.user_id = t1.user_id 
  • 优化后SQL:
    create view tmp_view as(
    select
    ..
    from t10
    left join t11 on t11.group_id = t10.group_id
    );
    select... 
    from t1
    left join t2 on t2.user_id = t1.user_id 
    left join tmp_view on tmp_view.user_id = t1.user_id 
    left join t12 on t12.user_id = t1.user_id 
support.huaweicloud.com/devg-rule-mrs/mrs_07_450170.html