华为云用户手册

  • DIS有哪些特点和优势? 无限扩展:DIS数据通道的吞吐量每小时可从数MB扩展到数TB,PUT记录每秒钟可从数千次扩展到数百万。 易于使用:您可以在几秒钟内创建DIS数据通道,轻松地将数据放入通道中,并构建用于数据处理的应用程序。 成本低廉:DIS没有前期成本,您只需要为实际使用的资源付费即可。 并行处理:DIS可让您用多个应用程序同时处理同一个数据通道。例如,您可以让一个应用程序运行实时分析,让其他应用程序从同一个DIS数据通道中将数据发送至 对象存储服务 (Object Storage Service,简称OBS)。 安全可靠:DIS可将数据保留N*24小时,N的取值为1~7的整数,以防数据在应用程序故障、个别机器故障或设施故障时丢失。 父主题: 一般性问题
  • 操作步骤 使用“WinSCP”工具将“huaweicloud-sdk-dis-x.x.x.zip”上传至Linux系统任一目录。 x.x.x表示DIS SDK包的版本号。 使用“PuTTY”工具登录Linux系统,进入到“huaweicloud-sdk-dis-x.x.x.zip”所在目录,执行如下命令,获取DIS SDK压缩包的校验码。 sha256sum huaweicloud-sdk-dis-x.x.x.zip 显示类似如下校验码: # sha256sum dis-sdk-x.x.x.zip8be2c937e8d78b1a9b99777cee4e7131f8bf231de3f839cf214e7c5b5ba3c088 huaweicloud-sdk-dis-x.x.x.zip 打开DIS SDK的校验文件“huaweicloud-sdk-dis-x.x.x.zip.sha256sum”与上一步骤中获取的校验码进行对比。 一致,说明从获取的DIS SDK压缩包没被篡改。 不一致,说明DIS SDK压缩包被篡改,需要重新获取。
  • 如何开通DIS通道? 使用注册账户登录DIS控制台。 单击管理控制台左上角的,选择区域和项目。 单击“购买接入通道”配置相关参数。 表1 接入通道参数说明 参数 参数解释 参数示例 计费模式 按需计费 按需计费 区域 指的是云服务所在的物理位置。您可以在下拉框中选择并切换区域。 华北-北京1 基本信息 通道名称 用户发送或者接收数据时,需要指定通道名称,通道名称不可重复。通道名称由英文字母、数字、中划线和下划线组成。长度为1~64个字符。 dis-Tido 通道类型 普通通道单分区容量:最高发送速度可达1MB/秒或1000条记录/秒(达到任意一种速度上限才会被限流),最高提取速度可达 2MB/秒,单次请求的记录总大小不能超过1MB(不包含partitionKey数据大小)。 高级通道单分区容量:最高发送速度可达 5MB/秒或2000条记录/秒(达到任意一种速度上限才会被限流),最高提取速度可达 10MB/秒,单次请求的记录总大小不能超过5MB(不包含partitionKey数据大小) - 分区数量 分区是DIS数据通道的基本吞吐量单位。 5 分区计算 用户可以根据实际需求通过系统计算得到一个建议的分区数量值。 单击“分区计算”,弹出“计算所需分区数量”对话框。 根据实际需求填写“平均记录大小”、“最大写入记录数”和“消费程序数量”,“预估所需分区数量”选项框中将显示所需的分区数量,此值不可修改。 说明: 所需分区计算公式: 按流量计算所需写分区数:(所得数值需向上取整后作为分区数) 普通通道:平均记录大小*(1+分区预留比例20%)*最大写入记录数/(1*1024KB) 高级通道:平均记录大小*(1+分区预留比例20%)*最大写入记录数/(5*1024KB) 按消费程序数量计算读分区数:(消费程序数量/2后的数值需要保留两位小数,然后乘以“按流量计算所需写分区数”,最终取值需向上取整) (消费程序数量/2)*按流量计算所需的写分区数 获取“按流量计算所需写分区数”、“按消费程序数量计算读分区数”中的最大值作为预估所需分区数量。 单击“使用计算值”将系统计算出的建议值应用于“分区数量”。 - 生命周期(小时) 存储在DIS中的数据保留的最长时间,超过此时长数据将被清除。 取值范围:24~72的整数。 24 源数据类型 BLOB:存储在数据库管理系统中的一组二进制数据。“源数据类型”选择“BLOB”,则支持的“转储服务类型”为“OBS”、“ MRS ”。 JSON:一种开放的文件格式,以易读的文字为基础,用来传输由属性值或者序列性的值组成的数据对象。“源数据类型”选择“JSON”,则支持的“转储服务类型”为“OBS”、“MRS”、“ DLI ”和“DWS”。 CS V:纯文本形式存储的表格数据,分隔符默认采用逗号。 “源数据类型”选择“CSV”,则支持的“转储服务类型”为“OBS”、“MRS”、“DLI”、“DWS”。 JSON 自动扩缩容 创建通道的同时是否开启自动扩缩容功能。 通过单击通过单击或来关闭或开启自动扩缩容开关。 说明: 用户可在创建通道时定义是否自动扩缩容,也可对已创建的通道修改自动扩缩容属性。 自动缩容最小分区数 设置自动缩容的分区下限,自动缩容的目标分区数不小于下限值。 - 自动扩容最大分区数 设置自动扩容的分区上限,自动扩容的目标分区数不超过上限值。 - 源数据分隔符 源数据为CSV格式时的数据分隔符。 - Schema开关 创建通道的同时是否为其创建数据Schema。源数据类型为JSON或CSV时可配置该参数。 通过单击或来关闭或开启Schema配置开关。 说明: 若创建通道时,没有同时创建数据Schema,可待通道创建成功后。到通道的管理页面创建数据Schema,详情请参见管理源数据Schema。 “源数据类型”为“JSON”和“CSV”时,可选择创建数据Schema。 源数据Schema 支持输入和导入源数据样例,源数据样例格式为JSON或者CSV,详细操作请参见管理源数据Schema。 在左侧文本框中输入JSON或者CSV格式的源数据样例,也可单击导入源数据样例。 在左侧文本框中单击,可删除左侧文本框中已输入或导入的源数据样例。 在左侧文本框中单击,可在右侧文本框中根据源数据样例生成Avro schema。 在右侧文本框中单击,可删除已生成的Avro schema。 在右侧文本框中单击,可修改已生成的Avro schema。 仅当“Schema配置开关”配置为“开启”:时需要配置此参数。 企业项目 配置通道所属的企业项目。已开通企业项目管理服务的用户才可以配置该参数。默认值为default。 企业项目是一种云资源管理方式,企业项目管理服务提供统一的云资源按项目管理,以及项目内的资源管理、成员管理。 您可以选择默认的企业项目“default”或其他已有的企业项目。如果要创建新的企业项目,请登录企业管理控制台进行创建,详细操作请参考《企业管理用户指南》。 - 现在配置 单击“现在配置”,呈现添加标签。 添加标签具体请参考管理通道标签。 - 暂不配置 暂不配置任何信息。 - 标签 标签是通道的标识。为通道添加标签,可以方便用户识别和管理拥有的通道资源。 - 单击“立即购买”,弹出“规格确认”页面。 单击“提交”,完成通道接入。 父主题: 一般性问题
  • DIS如何实现转储数据至DWS的特定列 DIS支持将源数据类型为JSON格式的数据转储至DWS。转储前,需要配置源数据Schema。 源数据Schema,即用户的JSON数据样例,用于描述JSON数据格式。DIS可以根据此JSON数据样例生成Avro schema, 将通道内上传的JSON数据转换为Parquet或CarbonData格式。 参考创建源数据Schema,创建源数据Schema。如下以添加转储任务时创建源数据Schema为例进行说明。 选择源数据类型是Json的通道。 在通道详情页面的“转储任务”页签,单击“添加转储任务”。 转储服务类型选择DWS,通过导入文件的方式配置源数据Schema。 输入源数据样例,单击“转换源数据样例”并提交,生成源数据Schema。 图1 创建源数据Schema 配置Schema属性过滤功能。 schema过滤功能,只针对源数据schema根节点或一级子节点非array类型,才有效。即管理源数据Schema创建的源数据schema,满足根节点或一级子节点非array类型,界面才呈现此配置。 打开Schema过滤开关。 在源数据属性名列表中,勾选对应的属性名,完成DWS表中指定列的映射。 源数据属性名列表中的属性由源数据Schema的name字段生成,匹配DWS的列名称。 图2 配置Schema属性 如图2所示,源数据属性名只选择id,即少于对应表的总字段。 DWS侧创建集群,并执行如下命令创建表。 CREATE TABLE dis_test3(id TEXT,dev TEXT,online BIGINT,module TEXT default 'a',logTime TEXT,appId TEXT,event TEXT); DIS侧转储数据至DWS成功后,登录集群数据库查询dis_test3表格数据,可看到仅id列和module列插入数据,其中module列是默认数据。如图3所示。 图3 Schema属性过滤结果 父主题: 转储相关问题
  • DIS对接EPS鉴权后,进入通道列表页面看不见之前创建的通道? 问题原因:因DIS对接EPS鉴权后,租户在原来企业项目中没有查看通道详情的权限,DIS通道列表就不会显示对应的通道: 解决方法:在企业项目中为租户授予查看通道详情的权限: 进入项目管理页面。 图1 项目管理 选择某个企业项目,为租户授予该企业项目对应的权限,如查看通道详情,则需要选择dis:streams:get策略,租户就能在通道列表中看到属于该企业项目中所有通道详情。 图2 添加授权 图3 策略内容 父主题: 一般性问题
  • Agent如何配置AK/SK加密? 在配置项中,需要配置用户的SK,这属于敏感信息,如需加密,可以按如下步骤: 进入bin/目录 cd /opt/dis-agent-X.X.X/bin 执行加密脚本,输入密码后回车 bash dis-encrypt.sh 控制台打印的“Encrypt result:”后面的字符串即为加密后的结果。通过这种方式分别加密MySQL密码和用户SK,并将密文配置到配置文件中即可。 父主题: DIS Agent相关问题
  • Agent如何配置递归监听一个目录? DIS Agent支持配置递归监听,将配置项"directoryRecursionEnabled"的值配置为"true"即可支持,例如以下配置可以匹配到"/home/one.log","/home/child/two.log","/home/child/child/three.log": ---region: REGIONak: YOUR_AKsk: YOUR_SKprojectId: YOUR_PROJECTIDendpoint: ENDPOINTflows: - DISStream: YOUR_STREAM filePattern: /home/*.log directoryRecursionEnabled: true initialPosition: START_OF_FILE maxBufferAgeMillis: 5000 父主题: DIS Agent相关问题
  • 数据存储在DIS和转储其他资源有什么区别? 开通DIS通道时需要选择“转储服务类型”。具体区别如表1所示。 选择“OBS”表示存储在DIS中,并周期性导入对象存储服务(Object Storage Service,简称OBS)。 选择“MRS”表示存储在DIS中,并周期性导入 MapReduce服务 (MRS)集群的HDFS中。 选择“DLI”表示存储在DIS中,并周期性导入DLI。 选择“DWS”表示存储在DIS中,并周期性导入 数据仓库 服务(DWS)中。 选择“CloudTable”表示存储在DIS中,并实时导入CloudTable集群的HBase表或OpenTSDB表中。 表1 DIS和转储其他资源区别 DIS存储 OBS存储 MRS存储 DLI存储 DWS存储 CloudTable存储 DIS服务自带。 需要另外申请。 需要另外申请。 需要另外申请。 需要另外申请。 需要另外申请。 无需另外付费。 需要根据OBS收费标准另外付费。 需要根据MRS和OBS收费标准另外付费。 需要根据DLI和OBS收费标准另外付费。 需要根据DWS和OBS收费标准另外付费。 需要根据CloudTable收费标准另外付费。 临时存储(最长保留168小时)。 数据可长期存储在OBS中,具体保存时长根据用户购买的OBS服务时长决定。 数据可长期存储在MRS中,具体保存时长根据用户购买的MRS服务时长决定。 数据可长期存储在DLI中,具体保存时长根据用户购买的DLI服务时长决定。 数据可长期存储在DWS中,具体保存时长根据用户购买的DWS服务时长决定。 数据可长期存储在CloudTable中,具体保存时长根据用户购买的CloudTable服务时长决定。 只存储在DIS中。 存储在DIS中,并周期性导入OBS。 存储在DIS中,并周期性导入MRS集群的HDFS中。 说明: 导入MRS集群前临时存储在OBS,待转储MRS完成后删除OBS上的临时存储文件。 存储在DIS中,并周期性导入DLI。 说明: 导入DLI前临时存储在OBS,待转储DLI完成后删除OBS上的临时存储文件。 存储在DIS中,并周期性导入DWS。 说明: 导入DWS前临时存储在OBS,待转储DWS完成后删除OBS上的临时存储文件。 存储在DIS中,实时导入CloudTable集群的HBase表 或OpenTSDB表中。 父主题: 一般性问题
  • Schema如何支持字段缺省或者为NULL 源数据Schema,即用户的JSON数据样例,用于描述JSON数据格式。DIS可以根据此JSON数据样例生成Avro schema,默认情况下不支持字段缺省或者为NULL,如图1。 图1 不支持字段缺省样例 "key1"字段对应的类型为"String"(Avro Schema中体现为:"type": "string"),这时如果源数据中"key1"不传或者传的值为NULL,那么转储任务会报错。 如果需要根据JSON数据样例生成的Schema可以支持缺省或者NULL,则需要勾选"支持空值"复选框,再单击"转换源数据样例",如图2所示。 图2 支持字段缺省样例 这时,"key1"字段对应的类型为"Union"复合类型(Avro Schema中体现为:"type": ["null", "string"]),如果源数据中"key1"不传或者传的值为NULL,那么会自动填补NULL为默认值,转储任务可以正常进行格式转换。 父主题: 转储相关问题
  • DIS主要应用于哪些场景? DIS对于从数据生产者快速移出数据,然后进行持续处理非常有用。以下是使用DIS的典型场景: 加速日志和数据传送获取:您无需等待批量处理数据,而是让数据生产者在生成数据后立即输入DIS数据通道,防止因数据生产者出现故障导致的数据损失。例如,系统和应用程序日志可以持续添加到数据通道并可在数秒内进行处理。 实时指标和报告:实时从DIS数据通道数据提取指标并生成报告。例如, 数据接入服务 应用程序可以处理系统和应用程序日志的指标和报告,因为数据被流入而不是等待收到批量数据。 实时数据分析:通过数据接入服务,可以运行实时通道数据分析。例如,可以通过API把数据实时添加到DIS数据通道中,并让您的DIS应用程序实时运行分析,从而在数分钟内从数据中获得重要见解,而无需数小时或数天时间。 复杂的数据通道处理:您可以创建DIS应用程序和数据通道的Directed Acyclic Graphs(DAG)。在这一情景中,一个或多个DIS应用程序可将数据添加到一个DIS数据通道进行进一步处理,以便于进行通道处理器的后续阶段。 父主题: 一般性问题
  • DIS如何发送和接收数据? 开通DIS通道,在 IAM (用户认证中心)中获取账号的AK/SK。 在这里中下载“dis-sdk-X.X.X.zip”压缩包并解压缩。 建立工程,配置用户AK/SK、endpoint、projectId、region、通道名称、分区数量等。 配置完成后运行程序即可发送数据。 建立工程,配置用户AK/SK、endpoint、project、region、通道名称、partitionId和startingSequenceNumber。 配置完成后运行程序即可接收数据。 父主题: 一般性问题
  • 什么是分区? 分区(Partition)是DIS数据通道的基本吞吐量单位。创建通道时,将指定所需的分区数量。 普通通道单分区容量:最高发送速度可达1MB/秒或1000条记录/秒(达到任意一种速度上限才会被限流),最高提取速度可达 2MB/秒,单次请求的记录总大小不能超过1MB(不包含partitionKey数据大小)。 高级通道单分区容量:最高发送速度可达 5MB/秒或2000条记录/秒(达到任意一种速度上限才会被限流),最高提取速度可达 10MB/秒,单次请求的记录总大小不能超过5MB(不包含partitionKey数据大小) 目前每个租户默认Partition配额范围为1~50个,租户可以根据需要配置Partition个数。 若需扩大配额,请提交工单增加配额,具体上限需要根据集群的实际负载情况进行计算。 父主题: 一般性问题
  • Agent如何配置监听多目录或文件? DIS Agent支持配置监听多个目录或文件,例如想收集"/home/folder1/file1"和"/home/folder2/file2"这两个文件的日志,可以通过配置多个DISStream来实现: ---region: REGIONak: YOUR_AKsk: YOUR_SKprojectId: YOUR_PROJECTIDendpoint: ENDPOINTflows: - DISStream: YOUR_STREAM filePattern: /home/folder1/file1 initialPosition: START_OF_FILE maxBufferAgeMillis: 5000 - DISStream: YOUR_STREAM filePattern: /home/folder2/file2 initialPosition: START_OF_FILE maxBufferAgeMillis: 5000 父主题: DIS Agent相关问题
  • Agent如何配置代理? DIS Agent支持通过配置代理上传数据到DIS,需要配置"PROXY_HOST","PROXY_PORT","PROXY_USERNAME","PROXY_PASSWORD",这几个配置项介绍可以查看Agent配置文件说明。 ---region: REGIONak: YOUR_AKsk: YOUR_SKprojectId: YOUR_PROJECTIDendpoint: ENDPOINTPROXY_HOST: YOUR_PROXY_HOSTPROXY_PORT: YOUR_PROXY_PORTPROXY_USERNAME: YOUR_PROXY_USERNAMEPROXY_PASSWORD: YOUR_PROXY_PASSWORDflows: - DISStream: YOUR_STREAM filePattern: /home/*.log initialPosition: START_OF_FILE maxBufferAgeMillis: 5000 父主题: DIS Agent相关问题
  • DIS有哪些模块及各模块功能? 服务控制面 完成服务的开通、删除、配置操作,并将用户信息同步到数据面。 完成数据面资源的申请与自动部署。 服务数据面 接收用户发送数据的请求,对已鉴权的数据接收并存储。 接收用户获取数据的请求,在鉴权后输出对应的用户数据。 按时老化存储在系统中的用户数据。 根据用户配置,将用户数据存储到对象存储服务(Object Storage Service,简称OBS)。 服务维护 负责服务的安装、升级。 负责服务的配置、巡检、日志收集与分析、运行监控。 负责服务工单处理。 用户SDK 提供Java接口,供用户上传与下载数据。 提供 数据加密 功能。 父主题: 一般性问题
  • 安装和启动GDS 登录 GaussDB (DWS)管理控制台。 在左侧导航栏中,单击“连接客户端”。 在 “gsql 命令行客户端”的下拉列表中,选择对应版本的GaussDB(DWS)客户端。 请根据集群版本和安装客户端的操作系统,选择对应版本。 客户端CPU架构要和集群一致,如果集群是X86规格,则也应该选择X86客户端。 单击“下载”。 将GDS工具包上传至ECS的/opt目录中,本例以上传Euler Kunpeng版本的工具包为例。 在工具包所在目录下,解压工具包。 12 cd /opt/unzip dws_client_8.1.x_euler_kunpeng_x64.zip 创建用户gds_user及其所属的用户组gdsgrp。此用户用于启动GDS,且需要拥有读取数据源文件目录的权限。 12 groupadd gdsgrpuseradd -g gdsgrp gds_user 修改工具包以及数据源文件目录属主为创建的用户gds_user及其所属的用户组gdsgrp。 123 chown -R gds_user:gdsgrp /opt/chown -R gds_user:gdsgrp /data1chown -R gds_user:gdsgrp /data2 切换到gds_user用户。 1 su - gds_user 执行环境依赖脚本(仅8.1.x版本适用)。 12 cd /opt/gds/binsource gds_env 启动GDS。 1234 /opt/gds/bin/gds -d /data1/script/tpch-kit/tpch1000X -p 192.168.0.90:5000 -H 192.168.0.0/24 -l /opt/gds/gds01_log.txt -D #TPC-H使用/opt/gds/bin/gds -d /data2/script/tpch-kit/tpch1000X -p 192.168.0.90:5001 -H 192.168.0.0/24 -l /opt/gds/gds02_log.txt -D #TPC-H使用/opt/gds/bin/gds -d /data1/script/tpcds-kit/tpcds1000X/ -p 192.168.0.90:5002 -H 192.168.0.0/24 -l /opt/gds/gds03_log.txt -D #TPC-DS使用/opt/gds/bin/gds -d /data2/script/tpcds-kit/tpcds1000X/ -p 192.168.0.90:5003 -H 192.168.0.0/24 -l /opt/gds/gds04_log.txt -D #TPC-DS使用 命令中的斜体部分请根据实际填写,如果数据分片存放至多个数据盘目录,需要启动对应目录数量的GDS。 如果TPC-H和TPC-DS数据同时测试,需要启动以上4个GDS,如果只测试TPC-DS或TPC-H数据,请根据后面的“#xxx”备注启动对应的GDS服务即可。 -d dir:保存有待导入数据的数据文件所在目录。 -p ip:port:GDS监听IP和监听端口。IP替换为ECS的内网IP,确保GaussDB(DWS)能通过此IP与GDS的通讯;端口对于TPC-H取5000、5001,对于TPC-DS取5002、5003。 -H address_string:允许哪些主机连接和使用GDS服务。参数需为CIDR格式。此地址配置成GaussDB(DWS)的集群内网网段(即GDS所在的ECS与GaussDB(DWS)在同一个VPC下,以内网通讯即可),例如192.168.0.0/24。 -l log_file:存放GDS的日志文件路径及文件名。 -D:后台运行GDS。仅支持Linux操作系统下使用。
  • SQL18 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031 select i_item_id, ca_country, ca_state, ca_county, avg( cast(cs_quantity as decimal(12,2))) agg1, avg( cast(cs_list_price as decimal(12,2))) agg2, avg( cast(cs_coupon_amt as decimal(12,2))) agg3, avg( cast(cs_sales_price as decimal(12,2))) agg4, avg( cast(cs_net_profit as decimal(12,2))) agg5, avg( cast(c_birth_year as decimal(12,2))) agg6, avg( cast(cd1.cd_dep_count as decimal(12,2))) agg7 from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd1.cd_demo_sk and cs_bill_customer_sk = c_customer_sk and cd1.cd_gender = 'M' and cd1.cd_education_status = 'Primary' and c_current_cdemo_sk = cd2.cd_demo_sk and c_current_addr_sk = ca_address_sk and c_birth_month in (10,1,8,7,3,5) and d_year = 1998 and ca_state in ('NE','OK','NC' ,'CO','ID','AR','MO') group by rollup (i_item_id, ca_country, ca_state, ca_county) order by ca_country, ca_state, ca_county,i_item_id limit 100;
  • SQL20 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio fromcatalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Shoes', 'Women') and cs_sold_date_sk = d_date_sk and d_date between cast('2001-03-21' as date) and (cast('2001-03-21' as date) + 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratiolimit 100;
  • SQL17 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142 select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as store_returns_quantitycount ,avg(sr_return_quantity) as store_returns_quantityave ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_statelimit 100;
  • SQL12 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratiofromweb_sales,item ,date_dimwhere ws_item_sk = i_item_sk and i_category in ('Music', 'Shoes', 'Children')and ws_sold_date_sk = d_date_skand d_date between cast('2000-05-14' as date) and (cast('2000-05-14' as date) + 30 )group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_priceorder by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratiolimit 100;
  • SQL7 1 2 3 4 5 6 7 8 9101112131415161718 select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'U' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 1999 group by i_item_id order by i_item_id limit 100;
  • SQL13 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940414243444546474849 select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and((ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3 )or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = '2 yr Degree' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1 ) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1 )) and((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('IL', 'WI', 'TN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MO', 'OK', 'WA') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('NE', 'VA', 'GA') and ss_net_profit between 50 and 250 ));
  • SQL10 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556 select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Clark County','Richardson County','Tom Green County','Sullivan County','Cass County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 ANd 1+3) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_countlimit 100;
  • SQL5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99100101102103104105106107108109110111112113114115116117118119120121122123124125 with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, ss_sold_date_sk as date_sk, ss_ext_sales_price as sales_price, ss_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from store_sales union all select sr_store_sk as store_sk, sr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, sr_return_amt as return_amt, sr_net_loss as net_loss from store_returns ) salesreturns, date_dim, store where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and store_sk = s_store_sk group by s_store_id) , csr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, cs_sold_date_sk as date_sk, cs_ext_sales_price as sales_price, cs_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from catalog_sales union all select cr_catalog_page_sk as page_sk, cr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, cr_return_amount as return_amt, cr_net_loss as net_loss from catalog_returns ) salesreturns, date_dim, catalog_page where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , wsr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, ws_sold_date_sk as date_sk, ws_ext_sales_price as sales_price, ws_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from web_sales union all select ws_web_site_sk as wsr_web_site_sk, wr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, wr_return_amt as return_amt, wr_net_loss as net_loss from web_returns left outer join web_sales on ( wr_item_sk = ws_item_sk and wr_order_number = ws_order_number) ) salesreturns, date_dim, web_site where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || s_store_id as id , sales , returns , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales , returns , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id limit 100;
  • SQL3 1 2 3 4 5 6 7 8 9101112131415161718 select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_ext_sales_price) sum_agg from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 125 and dt.d_moy=11 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,sum_agg desc ,brand_id limit 100;
  • SQL2 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657 with wscs as (select sold_date_sk ,sales_price from (select ws_sold_date_sk sold_date_sk ,ws_ext_sales_price sales_price from web_sales union all select cs_sold_date_sk sold_date_sk ,cs_ext_sales_price sales_price from catalog_sales)), wswscs as (select d_week_seq, sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales from wscs ,date_dim where d_date_sk = sold_date_sk group by d_week_seq) select d_week_seq1 ,round(sun_sales1/sun_sales2,2) ,round(mon_sales1/mon_sales2,2) ,round(tue_sales1/tue_sales2,2) ,round(wed_sales1/wed_sales2,2) ,round(thu_sales1/thu_sales2,2) ,round(fri_sales1/fri_sales2,2) ,round(sat_sales1/sat_sales2,2) from (select wswscs.d_week_seq d_week_seq1 ,sun_sales sun_sales1 ,mon_sales mon_sales1 ,tue_sales tue_sales1 ,wed_sales wed_sales1 ,thu_sales thu_sales1 ,fri_sales fri_sales1 ,sat_sales sat_sales1 from wswscs,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999) y, (select wswscs.d_week_seq d_week_seq2 ,sun_sales sun_sales2 ,mon_sales mon_sales2 ,tue_sales tue_sales2 ,wed_sales wed_sales2 ,thu_sales thu_sales2 ,fri_sales fri_sales2 ,sat_sales sat_sales2 from wswscs ,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999+1) z where d_week_seq1=d_week_seq2-53 order by d_week_seq1;
  • 命令生成方法 TPC-DS标准99个SQL查询语句可用如下方法生成: 准备工作。生成TPC-DS查询语句前需要修改query_templates目录下的文件: 登录测试过程申请的ECS,进入/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates目录: 1 cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates 新建文件hwdws.tpl,内容为: 12345 define __LIMITA = "";define __LIMITB = "";define __LIMITC = "limit %d";define _BEGIN = "-- begin query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE];define _END = "-- end query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; 因TPC-DS工具中SQL语句生成模板有语法错误,需修改query77.tpl,将135行的‘, coalesce(returns, 0) returns’改为‘, coalesce(returns, 0) as returns’。 执行以下命令生成查询语句: 12 cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools./dsqgen -input ../query_templates/templates.lst -directory ../query_templates/ -scale 1000 -dialect hwdws 执行后会生成query_0.sql文件,里面放着99个标准SQL语句,需要手动去切分成99个文件。 生成的标准查询中如下日期函数语法在GaussDB(DWS)暂不支持,需要手动进行修改: 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667 Q5: and (cast('2001-08-19' as date) + 14 days)修改为and (cast('2001-08-19' as date) + 14)Q12:and (cast('1999-02-28' as date) + 30 days)修改为and (cast('1999-02-28' as date) + 30)Q16:(cast('1999-4-01' as date) + 60 days)修改为(cast('1999-4-01' as date) + 60)Q20:and (cast('1998-05-05' as date) + 30 days)修改为and (cast('1998-05-05' as date) + 30)Q21:and d_date between (cast ('2000-05-19' as date) - 30 days)修改为and d_date between (cast ('2000-05-19' as date) - 30)and (cast ('2000-05-19' as date) + 30 days)修改为and (cast ('2000-05-19' as date) + 30)Q32:(cast('1999-02-22' as date) + 90 days)修改为(cast('1999-02-22' as date) + 90)Q37:and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60 days)修改为and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60)Q40:and d_date between (cast ('2002-05-10' as date) - 30 days)修改为and d_date between (cast ('2002-05-10' as date) - 30)and (cast ('2002-05-10' as date) + 30 days)修改为and (cast ('2002-05-10' as date) + 30)Q77:and (cast('1999-08-29' as date) + 30 days)修改为and (cast('1999-08-29' as date) + 30)Q80:and (cast('2002-08-04' as date) + 30 days)修改为and (cast('2002-08-04' as date) + 30)Q82:and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60 days)修改为and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60)Q92:(cast('2001-01-26' as date) + 90 days)修改为(cast('2001-01-26' as date) + 90)Q94:(cast('1999-5-01' as date) + 60 days)修改为(cast('1999-5-01' as date) + 60)Q95:(cast('1999-4-01' as date) + 60 days)修改为(cast('1999-4-01' as date) + 60)Q98:and (cast('2002-04-01' as date) + 30 days)修改为and (cast('2002-04-01' as date) + 30)
  • 创建弹性 云服务器ECS 参考《弹性云服务器用户指南》创建弹性云服务器,创建的规格可参见下表。 由于TPC-DS、TPC-H数据集占用空间较大,以TPC-DS 1000X和TPC-H 1000X为例,分别占用930GB和1100GB。请创建弹性云服务器时,根据需求添加数据盘,举例如下: 单测TPC-DS或者TPC-H时:挂载2块超高IO 600GB数据盘。 同时测TPC-DS和TPC-H时:挂载2块超高IO 1200GB数据盘。 表1 ECS规格 参数项 参数取值 计费模式 按需计费 区域 华北-北京4 可用区 可用区1 CPU架构 鲲鹏计算 规格 鲲鹏通用计算增强型 | kc1.8xlarge.2 32vCPUs|64 GiB 镜像 EulerOS 2.8 64bit with ARM(40GB) 数据盘 系统盘通用型SSD 40GB,数据盘要求如下: 单测TPC-DS或者TPC-H时:挂载2块超高IO 600GB数据盘。 同时测TPC-DS和TPC-H时:挂载2块超高IO 1200GB数据盘。 父主题: 创建弹性云服务器ECS和数据仓库GaussDB(DWS)
  • 操作步骤 使用gsql连接DWS成功后,执行以下SQL创建目标表(共24张表)。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671 CREATE TABLE customer_address( ca_address_sk bigint not null , ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20)) with (orientation = column)distribute by hash (ca_address_sk);CREATE TABLE customer_demographics( cd_demo_sk bigint not null , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint) with (orientation = column)distribute by hash (cd_demo_sk);CREATE TABLE date_dim( d_date_sk bigint not null, d_date_id char(16) not null, d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) ) with (orientation = column) DISTRIBUTE by hash(d_date_sk) PARTITION BY Range(d_year) ( partition p1 values less than(1950), partition p2 values less than(2000), partition p3 values less than(2050), partition p4 values less than(2100), partition p5 values less than(3000), partition p6 values less than(maxvalue));CREATE TABLE warehouse( w_warehouse_sk bigint not null, w_warehouse_id char(16) not null, w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2)) with (orientation = column)distribute by replication;CREATE TABLE ship_mode( sm_ship_mode_sk bigint not null, sm_ship_mode_id char(16) not null, sm_type char(30) , sm_code char(10) , sm_carrier char(20) , sm_contract char(20) ) with (orientation = column)distribute by replication;CREATE TABLE time_dim( t_time_sk bigint not null, t_time_id char(16) not null, t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm char(2) , t_shift char(20) , t_sub_shift char(20) , t_meal_time char(20) ) with (orientation = column)distribute by hash (t_time_sk);CREATE TABLE reason( r_reason_sk bigint not null, r_reason_id char(16) not null, r_reason_desc char(100)) with (orientation = column)distribute by replication;CREATE TABLE income_band( ib_income_band_sk bigint not null, ib_lower_bound bigint , ib_upper_bound bigint ) with (orientation = column)distribute by replication;CREATE TABLE item( i_item_sk bigint not null, i_item_id char(16) not null, i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) ) with (orientation = column)distribute by hash (i_item_sk);CREATE TABLE store( s_store_sk bigint not null, s_store_id char(16) not null, s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2)) with (orientation = column)distribute by replication;CREATE TABLE call_center( cc_call_center_sk bigint not null, cc_call_center_id char(16) not null, cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) ) with (orientation = column)distribute by replication;drop table if exists customer;CREATE TABLE customer( c_customer_sk bigint not null, c_customer_id char(16) not null, c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10)) with (orientation = column)distribute by hash (c_customer_sk);CREATE TABLE web_site( web_site_sk bigint not null, web_site_id char(16) not null, web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2)) with (orientation = column)distribute by replication;CREATE TABLE household_demographics( hd_demo_sk bigint not null, hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint) with (orientation = column)distribute by hash (hd_demo_sk);CREATE TABLE web_page( wp_web_page_sk bigint not null, wp_web_page_id char(16) not null, wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) with (orientation = column)distribute by replication;CREATE TABLE promotion( p_promo_sk bigint not null, p_promo_id char(16) not null, p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1) ) with (orientation = column)DISTRIBUTE BY HASH(p_promo_sk);CREATE TABLE catalog_page( cp_catalog_page_sk bigint not null, cp_catalog_page_id char(16) not null, cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) ) with (orientation = column)distribute by hash (cp_catalog_page_sk);CREATE TABLE inventory( inv_date_sk bigint not null, inv_item_sk bigint not null, inv_warehouse_sk bigint not null, inv_quantity_on_hand integer ) with (orientation = column)distribute by hash (inv_item_sk)partition by range(inv_date_sk)(partition p1 values less than(2451180),partition p2 values less than(2451545),partition p3 values less than(2451911),partition p4 values less than(2452276),partition p5 values less than(2452641),partition p6 values less than(2453006),partition p7 values less than(maxvalue));CREATE TABLE catalog_returns( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint not null, cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint not null, cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2) ) with (orientation = column)distribute by hash (cr_item_sk)partition by range(cr_returned_date_sk)(partition p1 values less than(2450815),partition p2 values less than(2451180),partition p3 values less than(2451545),partition p4 values less than(2451911),partition p5 values less than(2452276),partition p6 values less than(2452641),partition p7 values less than(2453006),partition p8 values less than(maxvalue));CREATE TABLE web_returns( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint not null, wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint not null, wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) ) with (orientation = column)distribute by hash (wr_item_sk)partition by range(wr_returned_date_sk)(partition p1 values less than(2450815),partition p2 values less than(2451180),partition p3 values less than(2451545),partition p4 values less than(2451911),partition p5 values less than(2452276),partition p6 values less than(2452641),partition p7 values less than(2453006),partition p8 values less than(maxvalue));CREATE TABLE store_returns( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint not null, sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint not null, sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2) ) with (orientation = column)distribute by hash (sr_item_sk)partition by range(sr_returned_date_sk)(partition p1 values less than (2451180) ,partition p2 values less than (2451545) ,partition p3 values less than (2451911) ,partition p4 values less than (2452276) ,partition p5 values less than (2452641) ,partition p6 values less than (2453006) ,partition p7 values less than (maxvalue));CREATE TABLE web_sales( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint not null, ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint not null, ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) ) with (orientation = column)distribute by hash (ws_item_sk)partition by range(ws_sold_date_sk)(partition p1 values less than(2451180),partition p2 values less than(2451545),partition p3 values less than(2451911),partition p4 values less than(2452276),partition p5 values less than(2452641),partition p6 values less than(2453006),partition p7 values less than(maxvalue));CREATE TABLE catalog_sales( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint not null, cs_promo_sk bigint , cs_order_number bigint not null, cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) ) with (orientation = column)distribute by hash (cs_item_sk)partition by range(cs_sold_date_sk)(partition p1 values less than(2451180),partition p2 values less than(2451545),partition p3 values less than(2451911),partition p4 values less than(2452276),partition p5 values less than(2452641),partition p6 values less than(2453006),partition p7 values less than(maxvalue));CREATE TABLE store_sales( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint not null, ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint not null, ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) with (orientation = column)distribute by hash (ss_item_sk)partition by range(ss_sold_date_sk)(partition p1 values less than(2451180),partition p2 values less than(2451545),partition p3 values less than(2451911),partition p4 values less than(2452276),partition p5 values less than(2452641),partition p6 values less than(2453006),partition p7 values less than(maxvalue)); 执行以下SQL语句创建GDS外表(共24张表)。 以下每个外表的“gsfs://192.168.0.90:500x/xxx | gsfs://192.168.0.90:500x/xxx”中的IP地址和端口,请替换成安装和启动GDS中的对应的GDS的监听IP和端口。如启动两个GDS,则使用“|”区分。如果启动多个GDS,需要将所有GDS的监听IP和端口配置到外表中。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782 DROP FOREIGN TABLE IF EXISTS customer_address_ext;CREATE FOREIGN TABLE customer_address_ext( ca_address_sk bigint , ca_address_id char(16) , ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/customer_address* | gsfs://192.168.0.90:5003/customer_address*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with customer_address_err;DROP FOREIGN TABLE IF EXISTS customer_demographics_ext;CREATE FOREIGN TABLE customer_demographics_ext( cd_demo_sk bigint , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/customer_demographics* | gsfs://192.168.0.90:5003/customer_demographics*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with customer_demographics_err;DROP FOREIGN TABLE IF EXISTS date_dim_ext;CREATE FOREIGN TABLE date_dim_ext( d_date_sk bigint , d_date_id char(16) , d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/date_dim* | gsfs://192.168.0.90:5003/date_dim*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with date_dim_err;DROP FOREIGN TABLE IF EXISTS warehouse_ext;CREATE FOREIGN TABLE warehouse_ext( w_warehouse_sk bigint , w_warehouse_id char(16) , w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/warehouse* | gsfs://192.168.0.90:5003/warehouse*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with warehouse_err;DROP FOREIGN TABLE IF EXISTS ship_mode_ext;CREATE FOREIGN TABLE ship_mode_ext(sm_ship_mode_sk bigint ,sm_ship_mode_id char(16) ,sm_type char(30) ,sm_code char(10) ,sm_carrier char(20) ,sm_contract char(20))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/ship_mode* | gsfs://192.168.0.90:5003/ship_mode*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with ship_mode_err;DROP FOREIGN TABLE IF EXISTS time_dim_ext;CREATE FOREIGN TABLE time_dim_ext(t_time_sk bigint ,t_time_id char(16) ,t_time bigint ,t_hour bigint ,t_minute bigint ,t_second bigint ,t_am_pm char(2) ,t_shift char(20) ,t_sub_shift char(20) ,t_meal_time char(20))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/time_dim* | gsfs://192.168.0.90:5003/time_dim*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with time_dim_err;DROP FOREIGN TABLE IF EXISTS reason_ext;CREATE FOREIGN TABLE reason_ext(r_reason_sk bigint ,r_reason_id char(16) ,r_reason_desc char(100))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/reason* | gsfs://192.168.0.90:5003/reason*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with reason_err;DROP FOREIGN TABLE IF EXISTS income_band_ext;CREATE FOREIGN TABLE income_band_ext(ib_income_band_sk bigint ,ib_lower_bound bigint ,ib_upper_bound bigint)SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/income_band* | gsfs://192.168.0.90:5003/income_band*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with income_band_err;DROP FOREIGN TABLE IF EXISTS item_ext;CREATE FOREIGN TABLE item_ext( i_item_sk bigint , i_item_id char(16) , i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/item* | gsfs://192.168.0.90:5003/item*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with item_err;DROP FOREIGN TABLE IF EXISTS store_ext;CREATE FOREIGN TABLE store_ext( s_store_sk bigint , s_store_id char(16) , s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/store_[^rs]* | gsfs://192.168.0.90:5003/storet_[^rs]*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with store_err;DROP FOREIGN TABLE IF EXISTS call_center_ext;CREATE FOREIGN TABLE call_center_ext( cc_call_center_sk bigint , cc_call_center_id char(16) , cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/call_center* | gsfs://192.168.0.90:5003/call_center*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with call_center_err;DROP FOREIGN TABLE IF EXISTS customer_ext;CREATE FOREIGN TABLE customer_ext( c_customer_sk bigint , c_customer_id char(16) , c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/customer_[^ad]* | gsfs://192.168.0.90:5003/customer_[^ad]*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'GBK',mode 'Normal')with customer_err;DROP FOREIGN TABLE IF EXISTS web_site_ext;CREATE FOREIGN TABLE web_site_ext( web_site_sk bigint , web_site_id char(16) , web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/web_site* | gsfs://192.168.0.90:5003/web_site*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with web_site_err;DROP FOREIGN TABLE IF EXISTS store_returns_ext;CREATE FOREIGN TABLE store_returns_ext( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint , sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint , sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/store_returns* | gsfs://192.168.0.90:5003/store_returns*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with store_returns_err;DROP FOREIGN TABLE IF EXISTS household_demographics_ext;CREATE FOREIGN TABLE household_demographics_ext( hd_demo_sk bigint , hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint)SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/household_demographics* | gsfs://192.168.0.90:5003/household_demographics*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with household_demographics_err;DROP FOREIGN TABLE IF EXISTS web_page_ext;CREATE FOREIGN TABLE web_page_ext( wp_web_page_sk bigint , wp_web_page_id char(16) , wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint)SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/web_page* | gsfs://192.168.0.90:5003/web_page*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with web_page_err;DROP FOREIGN TABLE IF EXISTS promotion_ext;CREATE FOREIGN TABLE promotion_ext( p_promo_sk bigint , p_promo_id char(16) , p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/promotion* | gsfs://192.168.0.90:5003/promotion*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with promotion_err;DROP FOREIGN TABLE IF EXISTS catalog_page_ext;CREATE FOREIGN TABLE catalog_page_ext( cp_catalog_page_sk bigint , cp_catalog_page_id char(16) , cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/catalog_page* | gsfs://192.168.0.90:5003/catalog_page*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with catalog_page_err;DROP FOREIGN TABLE IF EXISTS inventory_ext;CREATE FOREIGN TABLE inventory_ext( inv_date_sk bigint , inv_item_sk bigint , inv_warehouse_sk bigint , inv_quantity_on_hand integer)SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/inventory* | gsfs://192.168.0.90:5003/inventory*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with inventory_err;DROP FOREIGN TABLE IF EXISTS catalog_returns_ext;CREATE FOREIGN TABLE catalog_returns_ext( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint , cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint , cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/catalog_returns* | gsfs://192.168.0.90:5003/catalog_returns*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with catalog_returns_err;DROP FOREIGN TABLE IF EXISTS web_returns_ext;CREATE FOREIGN TABLE web_returns_ext( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint , wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint , wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/web_returns* | gsfs://192.168.0.90:5003/web_returns*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with web_returns_err;DROP FOREIGN TABLE IF EXISTS web_sales_ext;CREATE FOREIGN TABLE web_sales_ext( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint , ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint , ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/web_sales* | gsfs://192.168.0.90:5003/web_sales*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with web_sales_err;DROP FOREIGN TABLE IF EXISTS catalog_sales_ext;CREATE FOREIGN TABLE catalog_sales_ext( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint , cs_promo_sk bigint , cs_order_number bigint , cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) )SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/catalog_sales* | gsfs://192.168.0.90:5003/catalog_sales*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with catalog_sales_err;DROP FOREIGN TABLE IF EXISTS store_sales_ext;CREATE FOREIGN TABLE store_sales_ext( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint , ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint , ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2))SERVER gsmpp_serverOPTIONS(location 'gsfs://192.168.0.90:5002/store_sales* | gsfs://192.168.0.90:5003/store_sales*',FORMAT 'TEXT' ,DELIMITER '|',encoding 'utf8',FILL_MISSING_FIELDS 'true',IGNORE_EXTRA_DATA 'true',mode 'Normal')with store_sales_err; 执行以下SQL语句导入数据。 1 2 3 4 5 6 7 8 9101112131415161718192021222324 INSERT INTO customer_address SELECT * FROM customer_address_ext;INSERT INTO customer_demographics SELECT * FROM customer_demographics_ext;INSERT INTO date_dim SELECT * FROM date_dim_ext;INSERT INTO warehouse SELECT * FROM warehouse_ext;INSERT INTO ship_mode SELECT * FROM ship_mode_ext;INSERT INTO time_dim SELECT * FROM time_dim_ext;INSERT INTO reason SELECT * FROM reason_ext;INSERT INTO income_band SELECT * FROM income_band_ext;INSERT INTO item SELECT * FROM item_ext;INSERT INTO store SELECT * FROM store_ext;INSERT INTO call_center SELECT * FROM call_center_ext;INSERT INTO customer SELECT * FROM customer_ext;INSERT INTO web_site SELECT * FROM web_site_ext;INSERT INTO household_demographics SELECT * FROM household_demographics_ext;INSERT INTO web_page SELECT * FROM web_page_ext;INSERT INTO promotion SELECT * FROM promotion_ext;INSERT INTO catalog_page SELECT * FROM catalog_page_ext;INSERT INTO inventory SELECT * FROM inventory_ext;INSERT INTO catalog_returns SELECT * FROM catalog_returns_ext;INSERT INTO web_returns SELECT * FROM web_returns_ext;INSERT INTO store_returns SELECT * FROM store_returns_ext;INSERT INTO web_sales SELECT * FROM web_sales_ext;INSERT INTO catalog_sales SELECT * FROM catalog_sales_ext;INSERT INTO store_sales SELECT * FROM store_sales_ext;
  • 表数据行数 表1 TPC-DS 序号 表名 行数 1 customer_address 6,000,000 2 customer_demographics 1,920,800 3 date_dim 73,049 4 warehouse 20 5 ship_mode 20 6 time_dim 86,400 7 reason 65 8 income_band 20 9 item 300,000 10 store 1,002 11 call_center 42 12 customer 12,000,000 13 web_site 54 14 household_demographics 7,200 15 web_page 3,000 16 promotion 1,500 17 catalog_page 30,000 18 inventory 783,000,000 19 catalog_returns 143,996,756 20 web_returns 71,997,522 21 store_returns 287,999,764 22 web_sales 720,000,376 23 catalog_sales 1,439,980,416 24 store_sales 2,879,987,999
共99354条