云数据库GaussDB规划存储模型

GaussDB支持行列混合存储。行、列存储模型各有优劣,建议根据实际情况选择。

行存储是指将表按行存储到硬盘分区上,列存储是指将表按列存储到硬盘分区上。默认情况下,创建的表为行存储。行存储和列存储的差异请参见图1。

图1 行存储和列存储的差异

上图中,左上为行存表,右上为行存表在硬盘上的存储方式。左下为列存表,右下为列存表在硬盘上的存储方式。

行、列存储有如下优缺点:

一般情况下,如果表的字段比较多(大宽表),查询中涉及到的列不多的情况下,适合列存储。如果表的字段个数比较少,查询大部分字段,那么选择行存储比较好。

行存表

默认创建表的类型。数据按行进行存储,即一行数据是连续存储。适用于对数据需要经常更新的场景。

postgres=# CREATE TABLE customer_t1

(

state_ID CHAR(2),

state_NAME VARCHAR2(40),

area_ID NUMBER

);

--删除表

postgres=# DROP TABLE customer_t1;

列存表

数据按列进行存储,即一列所有数据是连续存储的。单列查询IO小,比行存表占用更少的存储空间。适合数据批量插入、更新较少和以查询为主统计分析类的场景。列存表不适合点查询。

postgres=# CREATE TABLE customer_t2

(

state_ID CHAR(2),

state_NAME VARCHAR2(40),

area_ID NUMBER

)

WITH (ORIENTATION = COLUMN);

--删除表

postgres=# DROP TABLE customer_t2;


行存表和列存表的选择

更新频繁程度

数据如果频繁更新,选择行存表。

插入频繁程度

频繁的少量插入,选择行存表。一次插入大批量数据,选择列存表。

表的列数

表的列数很多,选择列存表。

查询的列数

如果每次查询时,只涉及了表的少数(<50%总列数)几个列,选择列存表。

压缩率

列存表比行存表压缩率高。但高压缩率会消耗更多的CPU资源。

云数据库GaussDB(DWS)如何实现行转列及列转行?

云数据库GaussDB(DWS)如何实现行转列及列转行?

  • 案例展示

    场景介绍

    以学生成绩为例:

    老师会按照学科录入成绩,每科老师都会单独录入每个学生对应学科的成绩,而每位学生只关注自己各科的成绩。如果把老师录入数据作为原始表,那么学生查看自己的成绩就要用到行转列;如果让学生自己填写各科的成绩并汇总,然后老师去查自己学科所有学生的成绩,那就是列转行。

    行转列与列转行的示意图如下:

    图1 示意图

    1. 行转列

    将多行数据转换成一行显示,或将一列数据转换成多列显示。

    1. 列转行

    将一行数据转换成多行显示,或将多列数据转换成一列显示。


    示例表

    创建行存表students_info并插入数据。

    CREATE TABLE students_info(name varchar(20),subject varchar(100),score bigint) distribute by hash(name);
    INSERT INTO students_info VALUES('lily','math',95);
    INSERT INTO students_info VALUES('lily','physics',80);
    INSERT INTO students_info VALUES('lily','literature',92);
    INSERT INTO students_info VALUES('matu','math',75);
    INSERT INTO students_info VALUES('matu','physics',90);
    INSERT INTO students_info VALUES('matu','literature',85);
    INSERT INTO students_info VALUES('jack','math',90);
    INSERT INTO students_info VALUES('jack','physics',95);
    INSERT INTO students_info VALUES('jack','literature',95);


    查看表students_info信息。

    SELECT * FROM students_info;
    name | subject | score
    ------+------------+-------
    matu | math | 75
    matu | physics | 90
    matu | literature | 85
    lily | math | 95
    lily | physics | 80
    lily | literature | 92
    jack | math | 90
    jack | physics | 95
    jack | literature | 95


    创建列存表students_info1并插入数据。

    CREATE TABLE students_info1(name varchar(20), math bigint, physics bigint, literature bigint) with (orientation = column) distribute by hash(name);
    INSERT INTO students_info1 VALUES('lily',95,80,92);
    INSERT INTO students_info1 VALUES('matu',75,90,85);
    INSERT INTO students_info1 VALUES('jack',90,95,95);


    查看表students_info1信息。

    SELECT * FROM students_info1;
    name | math | physics | literature
    ------+------+---------+------------
    matu | 75 | 90 | 85
    lily | 95 | 80 | 92
    jack | 90 | 95 | 95
    (3 rows)


    静态行转列

    静态行转列需要手动指定每一列的列名,如果存在则取其对应值,否则将赋其默认值0。

    SELECT name,
    sum(case when subject='math' then score else 0 end) as math,
    sum(case when subject='physics' then score else 0 end) as physics,
    sum(case when subject='literature' then score else 0 end) as literature FROM students_info GROUP BY name;
    name | math | physics | literature
    ------+------+---------+------------
    matu | 75 | 90 | 85
    lily | 95 | 80 | 92
    jack | 90 | 95 | 95
    (3 rows)


    动态行转列

    8.1.2及以上集群版本可使用GROUP_CONCAT生成列存语句。

    SELECT group_concat(concat('sum(IF(subject = ''', subject, ''', score, 0)) AS "', name, '"'))FROM students_info; group_concat------------------------------------------------------------
    sum(IF(subject = 'literature', score, 0)) AS "jack",sum(IF(subject = 'literature', score, 0)) AS "lily",sum(IF(subject = 'literature', score, 0)) AS "matu",sum(IF(subject = 'math', score, 0)) AS "jack",sum(IF(subject = 'math', score, 0)) AS "lily",sum(IF(subject = 'math', score, 0)) AS "matu",sum(IF(subject = 'physics', score, 0)) AS "jack",sum(IF(subject = 'physics', score, 0)) AS "lily",sum(IF(subject = 'physics', score, 0)) AS "matu"
    (1 row)


    8.1.1及更低版本中可用LISTAGG生成列存语句。

    SELECT listagg(concat('sum(case when subject = ''', subject, ''' then score else 0 end) AS "', subject, '"'),',') within GROUP(ORDER BY 1)FROM (select distinct subject from students_info); listagg------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- sum(case when subject = 'literature' then score else 0 end) AS "literature",sum(case when subject = 'physics' then score else 0 end) AS "physics",sum(case when subject = 'math' then score else 0 end) AS "math"
    (1 row)


    再通过视图动态重建:

    CREATE OR REPLACE FUNCTION build_view()
    RETURNS VOID
    LANGUAGE plpgsql
    AS $$ DECLARE
    sql text;
    rec record;
    BEGIN
    sql := 'select LISTAGG(
    CONCAT( ''sum(case when subject = '''''', subject, '''''' then score else 0 end) AS "'', subject, ''"'' ) ,'','' ) within group(order by 1) from (select distinct subject from students_info);';
    EXECUTE sql INTO rec;
    sql := 'drop view if exists get_score'; EXECUTE sql; sql := 'create view get_score as select name, ' || rec.LISTAGG || ' from students_info group by name';
    EXECUTE sql;
    END$$;


    执行重建:

    CALL build_view();


    查询视图:

    SELECT * FROM get_score;

    name | literature | physics | math

    ------+------------+---------+------

    matu | 85 | 90 | 75

    lily | 92 | 80 | 95

    jack | 95 | 95 | 90

    (3 rows)


    列转行

    使用union all,将各科目(math、physics和literature)整合为一列,示例如下:

    SELECT * FROM

    (

    SELECT name, 'math' AS subject, math AS score FROM students_info1

    union all

    SELECT name, 'physics' AS subject, physics AS score FROM students_info1

    union all

    SELECT name, 'literature' AS subject, literature AS score FROM students_info1)

    order by name;

    name | subject | score

    ------+------------+-------

    jack | math | 90

    jack | physics | 95

    jack | literature | 95

    lily | math | 95

    lily | physics | 80

    lily | literature | 92

    matu | math | 75

    matu | physics | 90

    matu | literature | 85

    (9 rows)