华为云用户手册

  • 背景信息 下载流式数据,需要确定从分区的什么位置开始获取(即获取游标)。确定起始位置后,再循环获取数据。 获取游标有如下五种方式: AT_SEQUENCE_NUMBER AFTER_SEQUENCE_NUMBER TRIM_HORIZON LATEST AT_TIMESTAMP 为更好理解游标类型,您需要了解如下几个基本概念。 序列号(sequenceNumber),每个记录的唯一标识符。序列号由DIS在数据生产者调用PutRecord操作以添加数据到DIS数据通道时DIS服务自动分配的。同一分区键的序列号通常会随时间变化增加。PutRecords请求之间的时间段越长,序列号越大。 每个分区的sequenceNumber从0开始持续增长,每条数据对应唯一的sequenceNumber,超过生命周期后此sequenceNumber将过期不可用。(例如上传一条数据到新分区,其sequenceNumber起始为0,上传100条之后,则最后一条的sequenceNumber为99;如超过生命周期之后,0~99的数据则不可用) 分区的数据有效范围可以通过调用describeStream(查询通道详情)接口获取,其sequenceNumberRange代表数据有效范围,第一个值为最老数据的sequenceNumber,最后一个值为下一条上传数据的sequenceNumber(最新数据的sequenceNumber为此值-1) 例如[100, 200],表示此分区总共上传了200条数据,其中第0~99条已过期,有效的最老数据为100,最新数据为199,下一条上传数据的sequenceNumber为200。
  • 内容导航 SDK开发指南指导您如何安装和配置开发环境、如何通过调用DIS SDK提供的接口函数进行二次开发。 章节 内容 DIS SDK能做什么 内容导航 简要介绍DIS的概念和DIS SDK的概念。 SDK下载 兼容性 如何校验软件包完整性? 介绍使用DIS SDK进行二次开发过程中涉及到的资源信息。 开通DIS服务 介绍DIS服务和DIS通道的开通方式。 获取认证信息 介绍使用DIS SDK进行二次开发前需要进行的初始化工作。 Python:准备环境~~获取数据游标 介绍使用DIS SDK进行的常用操作(匹配python)。 Java:准备环境~~变更分区数量 介绍使用DIS SDK进行的常用操作(匹配java)。 DIS服务端错误码 介绍使用DIS SDK过程中遇到异常时的响应信息。 父主题: 简介
  • DIS概述 数据接入服务 (Data Ingestion Service)为处理或分析流数据的自定义应用程序构建数据流管道,主要解决云服务外的数据实时传输到云服务内的问题。数据接入服务每小时可从数十万种数据源(如日志和定位追踪事件、网站点击流、社交媒体源等)中连续捕获、传送和存储数TB数据。 云服务实现了在多地域部署基础设施,具备高度的可扩展性和可靠性,用户可根据自身需要指定地域使用DIS服务,由此获得更快的访问速度和实惠的服务价格。 DIS对数据传输所需要的基础设置、存储、网络和配置进行管理。您无需为数据通道担心配置、部署、持续的硬件维护等。此外,DIS还可在云区域同步复制数据,为您提供数据高可用性和数据持久性。
  • 修订记录 发布日期 修订说明 2019-12-11 第二十四次正式发布: 增加dis-kafka-adapter,增加使用Kafka Adapter上传与下载数据。 2019-10-08 第二十三次正式发布: 优化Java和Python SDK。 2019-07-08 第二十次正式发布: 小文件功能下线,删除“创建源数据类型是FILE的通道”。 2019-07-03 第十九次正式发布: Java SDK不兼容原生Kafka客户端,删除“连接Kafka consumer”。 2019-05-14 第十八次正式发布: 支持使用SDK实现数据的加密上传下载,修改初始化DIS客户端。 2019-05-07 第十七次正式发布: 查询通道列表SDK增加分页功能说明,修改查询通道列表。 2019-04-16 第十六次正式发布: 查询通道列表SDK增加响应参数说明,修改查询通道列表。 2019-03-18 第十五次正式发布: 新增如下内容: 添加转储任务~~查询转储详情 初始化DIS客户端 2019-02-23 第十四次正式发布: 修改如下内容: 获取认证信息 2019-01-17 第十三次正式发布: 内容优化。 2019-01-07 第十二次正式发布: 修改如下内容: 下载流式数据 2018-11-28 第十一次正式发布: 修改如下内容: 初始化DIS客户端 创建通道 下载流式数据 2018-11-07 第十次正式发布。 修改如下内容: 如何校验软件包完整性? 2018-09-25 第九次正式发布。 新增如下内容: 准备环境~~获取数据游标 2018-08-19 第八次正式发布。 修改如下内容: 配置样例工程 2018-07-23 第七次正式发布。 修改文档结构和名称。 2018-07-10 第六次正式发布。 新增了如下内容: 创建APP 删除APP 查询Checkpoint 变更分区数量 2018-06-12 第五次正式发布。 修改了如下内容: 开通DIS服务 DIS服务端错误码 2018-05-11 第四次正式发布。 修改了如下内容: 开通DIS通道 DIS服务端错误码 Uquery更名为 数据湖探索 DLI ,Data Lake Insight)。 2018-02-08 第三次正式发布。 新增了如下内容: 创建通道 删除通道 查询通道列表 查询通道详情 获取数据游标 修改了如下内容: 上传流式数据 下载流式数据 2017-11-18 第二次正式发布。 修改了如下内容: 开通DIS通道 2017-10-28 第一次正式发布。
  • SQL20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Shoes', 'Women') and cs_sold_date_sk = d_date_sk and d_date between cast('2001-03-21' as date) and (cast('2001-03-21' as date) + 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
  • SQL17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as store_returns_quantitycount ,avg(sr_return_quantity) as store_returns_quantityave ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_state limit 100;
  • SQL18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 select i_item_id, ca_country, ca_state, ca_county, avg( cast(cs_quantity as decimal(12,2))) agg1, avg( cast(cs_list_price as decimal(12,2))) agg2, avg( cast(cs_coupon_amt as decimal(12,2))) agg3, avg( cast(cs_sales_price as decimal(12,2))) agg4, avg( cast(cs_net_profit as decimal(12,2))) agg5, avg( cast(c_birth_year as decimal(12,2))) agg6, avg( cast(cd1.cd_dep_count as decimal(12,2))) agg7 from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd1.cd_demo_sk and cs_bill_customer_sk = c_customer_sk and cd1.cd_gender = 'M' and cd1.cd_education_status = 'Primary' and c_current_cdemo_sk = cd2.cd_demo_sk and c_current_addr_sk = ca_address_sk and c_birth_month in (10,1,8,7,3,5) and d_year = 1998 and ca_state in ('NE','OK','NC' ,'CO','ID','AR','MO') group by rollup (i_item_id, ca_country, ca_state, ca_county) order by ca_country, ca_state, ca_county, i_item_id limit 100;
  • SQL13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and((ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3 )or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = '2 yr Degree' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1 ) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1 )) and((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('IL', 'WI', 'TN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MO', 'OK', 'WA') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('NE', 'VA', 'GA') and ss_net_profit between 50 and 250 )) ;
  • SQL12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where ws_item_sk = i_item_sk and i_category in ('Music', 'Shoes', 'Children') and ws_sold_date_sk = d_date_sk and d_date between cast('2000-05-14' as date) and (cast('2000-05-14' as date) + 30 ) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
  • SQL7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'U' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 1999 group by i_item_id order by i_item_id limit 100;
  • SQL10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Clark County','Richardson County','Tom Green County','Sullivan County','Cass County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 ANd 1+3) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100;
  • SQL5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, ss_sold_date_sk as date_sk, ss_ext_sales_price as sales_price, ss_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from store_sales union all select sr_store_sk as store_sk, sr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, sr_return_amt as return_amt, sr_net_loss as net_loss from store_returns ) salesreturns, date_dim, store where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and store_sk = s_store_sk group by s_store_id) , csr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, cs_sold_date_sk as date_sk, cs_ext_sales_price as sales_price, cs_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from catalog_sales union all select cr_catalog_page_sk as page_sk, cr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, cr_return_amount as return_amt, cr_net_loss as net_loss from catalog_returns ) salesreturns, date_dim, catalog_page where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , wsr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, ws_sold_date_sk as date_sk, ws_ext_sales_price as sales_price, ws_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from web_sales union all select ws_web_site_sk as wsr_web_site_sk, wr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, wr_return_amt as return_amt, wr_net_loss as net_loss from web_returns left outer join web_sales on ( wr_item_sk = ws_item_sk and wr_order_number = ws_order_number) ) salesreturns, date_dim, web_site where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || s_store_id as id , sales , returns , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales , returns , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id limit 100;
  • SQL2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 with wscs as (select sold_date_sk ,sales_price from (select ws_sold_date_sk sold_date_sk ,ws_ext_sales_price sales_price from web_sales union all select cs_sold_date_sk sold_date_sk ,cs_ext_sales_price sales_price from catalog_sales)), wswscs as (select d_week_seq, sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales from wscs ,date_dim where d_date_sk = sold_date_sk group by d_week_seq) select d_week_seq1 ,round(sun_sales1/sun_sales2,2) ,round(mon_sales1/mon_sales2,2) ,round(tue_sales1/tue_sales2,2) ,round(wed_sales1/wed_sales2,2) ,round(thu_sales1/thu_sales2,2) ,round(fri_sales1/fri_sales2,2) ,round(sat_sales1/sat_sales2,2) from (select wswscs.d_week_seq d_week_seq1 ,sun_sales sun_sales1 ,mon_sales mon_sales1 ,tue_sales tue_sales1 ,wed_sales wed_sales1 ,thu_sales thu_sales1 ,fri_sales fri_sales1 ,sat_sales sat_sales1 from wswscs,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999) y, (select wswscs.d_week_seq d_week_seq2 ,sun_sales sun_sales2 ,mon_sales mon_sales2 ,tue_sales tue_sales2 ,wed_sales wed_sales2 ,thu_sales thu_sales2 ,fri_sales fri_sales2 ,sat_sales sat_sales2 from wswscs ,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999+1) z where d_week_seq1=d_week_seq2-53 order by d_week_seq1;
  • SQL3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_ext_sales_price) sum_agg from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 125 and dt.d_moy=11 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,sum_agg desc ,brand_id limit 100;
  • 命令生成方法 TPC-DS标准99个SQL查询语句可用如下方法生成: 准备工作。生成TPC-DS查询语句前需要修改query_templates目录下的文件: 登录测试过程申请的E CS ,进入/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates目录: 1 cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates 新建文件hwdws.tpl,内容为: 1 2 3 4 5 define __LIMITA = ""; define __LIMITB = ""; define __LIMITC = "limit %d"; define _BEGIN = "-- begin query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; define _END = "-- end query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; 因TPC-DS工具中SQL语句生成模板有语法错误,需修改query77.tpl,将135行的‘, coalesce(returns, 0) returns’改为‘, coalesce(returns, 0) as returns’。 执行以下命令生成查询语句: 1 2 cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools ./dsqgen -input ../query_templates/templates.lst -directory ../query_templates/ -scale 1000 -dialect hwdws 执行后会生成query_0.sql文件,里面放着99个标准SQL语句,需要手动去切分成99个文件。 生成的标准查询中如下日期函数语法在 GaussDB (DWS)暂不支持,需要手动进行修改: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 Q5: and (cast('2001-08-19' as date) + 14 days) 修改为 and (cast('2001-08-19' as date) + 14) Q12:and (cast('1999-02-28' as date) + 30 days) 修改为 and (cast('1999-02-28' as date) + 30) Q16:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q20:and (cast('1998-05-05' as date) + 30 days) 修改为 and (cast('1998-05-05' as date) + 30) Q21:and d_date between (cast ('2000-05-19' as date) - 30 days) 修改为 and d_date between (cast ('2000-05-19' as date) - 30) and (cast ('2000-05-19' as date) + 30 days) 修改为 and (cast ('2000-05-19' as date) + 30) Q32:(cast('1999-02-22' as date) + 90 days) 修改为 (cast('1999-02-22' as date) + 90) Q37:and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60 days) 修改为 and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60) Q40:and d_date between (cast ('2002-05-10' as date) - 30 days) 修改为 and d_date between (cast ('2002-05-10' as date) - 30) and (cast ('2002-05-10' as date) + 30 days) 修改为 and (cast ('2002-05-10' as date) + 30) Q77:and (cast('1999-08-29' as date) + 30 days) 修改为 and (cast('1999-08-29' as date) + 30) Q80:and (cast('2002-08-04' as date) + 30 days) 修改为 and (cast('2002-08-04' as date) + 30) Q82:and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60 days) 修改为 and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60) Q92:(cast('2001-01-26' as date) + 90 days) 修改为 (cast('2001-01-26' as date) + 90) Q94:(cast('1999-5-01' as date) + 60 days) 修改为 (cast('1999-5-01' as date) + 60) Q95:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q98:and (cast('2002-04-01' as date) + 30 days) 修改为 and (cast('2002-04-01' as date) + 30)
  • SQL9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc;
  • SQL13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders (c_custkey, c_count) group by c_count order by custdist desc, c_count desc;
  • SQL7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
  • SQL2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
  • SQL8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY ANODIZED STEEL' ) as all_nations group by o_year order by o_year;
  • 操作步骤 使用gsql连接DWS成功后,执行以下SQL创建目标表(共24张表)。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 CREATE TABLE customer_address ( ca_address_sk bigint not null , ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20) ) with (orientation = column) distribute by hash (ca_address_sk); CREATE TABLE customer_demographics ( cd_demo_sk bigint not null , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint ) with (orientation = column) distribute by hash (cd_demo_sk); CREATE TABLE date_dim ( d_date_sk bigint not null, d_date_id char(16) not null, d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) ) with (orientation = column) DISTRIBUTE by hash(d_date_sk) PARTITION BY Range(d_year) ( partition p1 values less than(1950), partition p2 values less than(2000), partition p3 values less than(2050), partition p4 values less than(2100), partition p5 values less than(3000), partition p6 values less than(maxvalue) ); CREATE TABLE warehouse ( w_warehouse_sk bigint not null, w_warehouse_id char(16) not null, w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE ship_mode ( sm_ship_mode_sk bigint not null, sm_ship_mode_id char(16) not null, sm_type char(30) , sm_code char(10) , sm_carrier char(20) , sm_contract char(20) ) with (orientation = column) distribute by replication; CREATE TABLE time_dim ( t_time_sk bigint not null, t_time_id char(16) not null, t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm char(2) , t_shift char(20) , t_sub_shift char(20) , t_meal_time char(20) ) with (orientation = column) distribute by hash (t_time_sk); CREATE TABLE reason ( r_reason_sk bigint not null, r_reason_id char(16) not null, r_reason_desc char(100) ) with (orientation = column) distribute by replication; CREATE TABLE income_band ( ib_income_band_sk bigint not null, ib_lower_bound bigint , ib_upper_bound bigint ) with (orientation = column) distribute by replication; CREATE TABLE item ( i_item_sk bigint not null, i_item_id char(16) not null, i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) ) with (orientation = column) distribute by hash (i_item_sk); CREATE TABLE store ( s_store_sk bigint not null, s_store_id char(16) not null, s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE call_center ( cc_call_center_sk bigint not null, cc_call_center_id char(16) not null, cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) ) with (orientation = column) distribute by replication; drop table if exists customer; CREATE TABLE customer ( c_customer_sk bigint not null, c_customer_id char(16) not null, c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10) ) with (orientation = column) distribute by hash (c_customer_sk); CREATE TABLE web_site ( web_site_sk bigint not null, web_site_id char(16) not null, web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2) ) with (orientation = column) distribute by replication; CREATE TABLE household_demographics ( hd_demo_sk bigint not null, hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint ) with (orientation = column) distribute by hash (hd_demo_sk); CREATE TABLE web_page ( wp_web_page_sk bigint not null, wp_web_page_id char(16) not null, wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) with (orientation = column) distribute by replication; CREATE TABLE promotion ( p_promo_sk bigint not null, p_promo_id char(16) not null, p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1) ) with (orientation = column) DISTRIBUTE BY HASH(p_promo_sk); CREATE TABLE catalog_page ( cp_catalog_page_sk bigint not null, cp_catalog_page_id char(16) not null, cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) ) with (orientation = column) distribute by hash (cp_catalog_page_sk); CREATE TABLE inventory ( inv_date_sk bigint not null, inv_item_sk bigint not null, inv_warehouse_sk bigint not null, inv_quantity_on_hand integer ) with (orientation = column) distribute by hash (inv_item_sk) partition by range(inv_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE catalog_returns ( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint not null, cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint not null, cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (cr_item_sk) partition by range(cr_returned_date_sk) ( partition p1 values less than(2450815), partition p2 values less than(2451180), partition p3 values less than(2451545), partition p4 values less than(2451911), partition p5 values less than(2452276), partition p6 values less than(2452641), partition p7 values less than(2453006), partition p8 values less than(maxvalue) ) ; CREATE TABLE web_returns ( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint not null, wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint not null, wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (wr_item_sk) partition by range(wr_returned_date_sk) ( partition p1 values less than(2450815), partition p2 values less than(2451180), partition p3 values less than(2451545), partition p4 values less than(2451911), partition p5 values less than(2452276), partition p6 values less than(2452641), partition p7 values less than(2453006), partition p8 values less than(maxvalue) ) ; CREATE TABLE store_returns ( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint not null, sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint not null, sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2) ) with (orientation = column) distribute by hash (sr_item_sk) partition by range(sr_returned_date_sk) ( partition p1 values less than (2451180) , partition p2 values less than (2451545) , partition p3 values less than (2451911) , partition p4 values less than (2452276) , partition p5 values less than (2452641) , partition p6 values less than (2453006) , partition p7 values less than (maxvalue) ) ; CREATE TABLE web_sales ( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint not null, ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint not null, ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (ws_item_sk) partition by range(ws_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE catalog_sales ( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint not null, cs_promo_sk bigint , cs_order_number bigint not null, cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (cs_item_sk) partition by range(cs_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; CREATE TABLE store_sales ( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint not null, ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint not null, ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) with (orientation = column) distribute by hash (ss_item_sk) partition by range(ss_sold_date_sk) ( partition p1 values less than(2451180), partition p2 values less than(2451545), partition p3 values less than(2451911), partition p4 values less than(2452276), partition p5 values less than(2452641), partition p6 values less than(2453006), partition p7 values less than(maxvalue) ) ; 执行以下SQL语句创建GDS外表(共24张表)。 以下每个外表的“gsfs://192.168.0.90:500x/xxx | gsfs://192.168.0.90:500x/xxx”中的IP地址和端口,请替换成安装和启动GDS中的对应的GDS的监听IP和端口。如启动两个GDS,则使用“|”区分。如果启动多个GDS,需要将所有GDS的监听IP和端口配置到外表中。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 DROP FOREIGN TABLE IF EXISTS customer_address_ext; CREATE FOREIGN TABLE customer_address_ext ( ca_address_sk bigint , ca_address_id char(16) , ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(5,2) , ca_location_type char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_address* | gsfs://192.168.0.90:5003/customer_address*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with customer_address_err; DROP FOREIGN TABLE IF EXISTS customer_demographics_ext; CREATE FOREIGN TABLE customer_demographics_ext ( cd_demo_sk bigint , cd_gender char(1) , cd_marital_status char(1) , cd_education_status char(20) , cd_purchase_estimate bigint , cd_credit_rating char(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_demographics* | gsfs://192.168.0.90:5003/customer_demographics*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with customer_demographics_err; DROP FOREIGN TABLE IF EXISTS date_dim_ext; CREATE FOREIGN TABLE date_dim_ext ( d_date_sk bigint , d_date_id char(16) , d_date date , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name char(9) , d_quarter_name char(6) , d_holiday char(1) , d_weekend char(1) , d_following_holiday char(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint , d_same_day_lq bigint , d_current_day char(1) , d_current_week char(1) , d_current_month char(1) , d_current_quarter char(1) , d_current_year char(1) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/date_dim* | gsfs://192.168.0.90:5003/date_dim*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with date_dim_err; DROP FOREIGN TABLE IF EXISTS warehouse_ext; CREATE FOREIGN TABLE warehouse_ext ( w_warehouse_sk bigint , w_warehouse_id char(16) , w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number char(10) , w_street_name varchar(60) , w_street_type char(15) , w_suite_number char(10) , w_city varchar(60) , w_county varchar(30) , w_state char(2) , w_zip char(10) , w_country varchar(20) , w_gmt_offset decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/warehouse* | gsfs://192.168.0.90:5003/warehouse*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with warehouse_err; DROP FOREIGN TABLE IF EXISTS ship_mode_ext; CREATE FOREIGN TABLE ship_mode_ext ( sm_ship_mode_sk bigint , sm_ship_mode_id char(16) , sm_type char(30) , sm_code char(10) , sm_carrier char(20) , sm_contract char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/ship_mode* | gsfs://192.168.0.90:5003/ship_mode*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with ship_mode_err; DROP FOREIGN TABLE IF EXISTS time_dim_ext; CREATE FOREIGN TABLE time_dim_ext ( t_time_sk bigint , t_time_id char(16) , t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm char(2) , t_shift char(20) , t_sub_shift char(20) , t_meal_time char(20) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/time_dim* | gsfs://192.168.0.90:5003/time_dim*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with time_dim_err; DROP FOREIGN TABLE IF EXISTS reason_ext; CREATE FOREIGN TABLE reason_ext ( r_reason_sk bigint , r_reason_id char(16) , r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/reason* | gsfs://192.168.0.90:5003/reason*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with reason_err; DROP FOREIGN TABLE IF EXISTS income_band_ext; CREATE FOREIGN TABLE income_band_ext ( ib_income_band_sk bigint , ib_lower_bound bigint , ib_upper_bound bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/income_band* | gsfs://192.168.0.90:5003/income_band*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with income_band_err; DROP FOREIGN TABLE IF EXISTS item_ext; CREATE FOREIGN TABLE item_ext ( i_item_sk bigint , i_item_id char(16) , i_rec_start_date date , i_rec_end_date date , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id bigint , i_brand char(50) , i_class_id bigint , i_class char(50) , i_category_id bigint , i_category char(50) , i_manufact_id bigint , i_manufact char(50) , i_size char(20) , i_formulation char(20) , i_color char(20) , i_units char(10) , i_container char(10) , i_manager_id bigint , i_product_name char(50) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/item* | gsfs://192.168.0.90:5003/item*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with item_err; DROP FOREIGN TABLE IF EXISTS store_ext; CREATE FOREIGN TABLE store_ext ( s_store_sk bigint , s_store_id char(16) , s_rec_start_date date , s_rec_end_date date , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours char(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type char(15) , s_suite_number char(10) , s_city varchar(60) , s_county varchar(30) , s_state char(2) , s_zip char(10) , s_country varchar(20) , s_gmt_offset decimal(5,2) , s_tax_precentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_[^rs]* | gsfs://192.168.0.90:5003/storet_[^rs]*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with store_err; DROP FOREIGN TABLE IF EXISTS call_center_ext; CREATE FOREIGN TABLE call_center_ext ( cc_call_center_sk bigint , cc_call_center_id char(16) , cc_rec_start_date date , cc_rec_end_date date , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours char(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class char(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name char(50) , cc_street_number char(10) , cc_street_name varchar(60) , cc_street_type char(15) , cc_suite_number char(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state char(2) , cc_zip char(10) , cc_country varchar(20) , cc_gmt_offset decimal(5,2) , cc_tax_percentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/call_center* | gsfs://192.168.0.90:5003/call_center*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with call_center_err; DROP FOREIGN TABLE IF EXISTS customer_ext; CREATE FOREIGN TABLE customer_ext ( c_customer_sk bigint , c_customer_id char(16) , c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date_sk char(10) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/customer_[^ad]* | gsfs://192.168.0.90:5003/customer_[^ad]*', FORMAT 'TEXT' , DELIMITER '|', encoding 'GBK', mode 'Normal' ) with customer_err; DROP FOREIGN TABLE IF EXISTS web_site_ext; CREATE FOREIGN TABLE web_site_ext ( web_site_sk bigint , web_site_id char(16) , web_rec_start_date date , web_rec_end_date date , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name char(50) , web_street_number char(10) , web_street_name varchar(60) , web_street_type char(15) , web_suite_number char(10) , web_city varchar(60) , web_county varchar(30) , web_state char(2) , web_zip char(10) , web_country varchar(20) , web_gmt_offset decimal(5,2) , web_tax_percentage decimal(5,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_site* | gsfs://192.168.0.90:5003/web_site*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with web_site_err; DROP FOREIGN TABLE IF EXISTS store_returns_ext; CREATE FOREIGN TABLE store_returns_ext ( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint , sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint , sr_return_quantity bigint , sr_return_amt decimal(7,2) , sr_return_tax decimal(7,2) , sr_return_amt_inc_tax decimal(7,2) , sr_fee decimal(7,2) , sr_return_ship_cost decimal(7,2) , sr_refunded_cash decimal(7,2) , sr_reversed_charge decimal(7,2) , sr_store_credit decimal(7,2) , sr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_returns* | gsfs://192.168.0.90:5003/store_returns*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with store_returns_err; DROP FOREIGN TABLE IF EXISTS household_demographics_ext; CREATE FOREIGN TABLE household_demographics_ext ( hd_demo_sk bigint , hd_income_band_sk bigint , hd_buy_potential char(15) , hd_dep_count bigint , hd_vehicle_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/household_demographics* | gsfs://192.168.0.90:5003/household_demographics*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with household_demographics_err; DROP FOREIGN TABLE IF EXISTS web_page_ext; CREATE FOREIGN TABLE web_page_ext ( wp_web_page_sk bigint , wp_web_page_id char(16) , wp_rec_start_date date , wp_rec_end_date date , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag char(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type char(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_page* | gsfs://192.168.0.90:5003/web_page*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with web_page_err; DROP FOREIGN TABLE IF EXISTS promotion_ext; CREATE FOREIGN TABLE promotion_ext ( p_promo_sk bigint , p_promo_id char(16) , p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost decimal(15,2) , p_response_target bigint , p_promo_name char(50) , p_channel_dmail char(1) , p_channel_email char(1) , p_channel_catalog char(1) , p_channel_tv char(1) , p_channel_radio char(1) , p_channel_press char(1) , p_channel_event char(1) , p_channel_demo char(1) , p_channel_details varchar(100) , p_purpose char(15) , p_discount_active char(1) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/promotion* | gsfs://192.168.0.90:5003/promotion*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with promotion_err; DROP FOREIGN TABLE IF EXISTS catalog_page_ext; CREATE FOREIGN TABLE catalog_page_ext ( cp_catalog_page_sk bigint , cp_catalog_page_id char(16) , cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_page* | gsfs://192.168.0.90:5003/catalog_page*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with catalog_page_err; DROP FOREIGN TABLE IF EXISTS inventory_ext; CREATE FOREIGN TABLE inventory_ext ( inv_date_sk bigint , inv_item_sk bigint , inv_warehouse_sk bigint , inv_quantity_on_hand integer ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/inventory* | gsfs://192.168.0.90:5003/inventory*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with inventory_err; DROP FOREIGN TABLE IF EXISTS catalog_returns_ext; CREATE FOREIGN TABLE catalog_returns_ext ( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint , cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint , cr_return_quantity bigint , cr_return_amount decimal(7,2) , cr_return_tax decimal(7,2) , cr_return_amt_inc_tax decimal(7,2) , cr_fee decimal(7,2) , cr_return_ship_cost decimal(7,2) , cr_refunded_cash decimal(7,2) , cr_reversed_charge decimal(7,2) , cr_store_credit decimal(7,2) , cr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_returns* | gsfs://192.168.0.90:5003/catalog_returns*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with catalog_returns_err; DROP FOREIGN TABLE IF EXISTS web_returns_ext; CREATE FOREIGN TABLE web_returns_ext ( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint , wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint , wr_return_quantity bigint , wr_return_amt decimal(7,2) , wr_return_tax decimal(7,2) , wr_return_amt_inc_tax decimal(7,2) , wr_fee decimal(7,2) , wr_return_ship_cost decimal(7,2) , wr_refunded_cash decimal(7,2) , wr_reversed_charge decimal(7,2) , wr_account_credit decimal(7,2) , wr_net_loss decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_returns* | gsfs://192.168.0.90:5003/web_returns*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with web_returns_err; DROP FOREIGN TABLE IF EXISTS web_sales_ext; CREATE FOREIGN TABLE web_sales_ext ( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint , ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint , ws_quantity bigint , ws_wholesale_cost decimal(7,2) , ws_list_price decimal(7,2) , ws_sales_price decimal(7,2) , ws_ext_discount_amt decimal(7,2) , ws_ext_sales_price decimal(7,2) , ws_ext_wholesale_cost decimal(7,2) , ws_ext_list_price decimal(7,2) , ws_ext_tax decimal(7,2) , ws_coupon_amt decimal(7,2) , ws_ext_ship_cost decimal(7,2) , ws_net_paid decimal(7,2) , ws_net_paid_inc_tax decimal(7,2) , ws_net_paid_inc_ship decimal(7,2) , ws_net_paid_inc_ship_tax decimal(7,2) , ws_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/web_sales* | gsfs://192.168.0.90:5003/web_sales*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with web_sales_err; DROP FOREIGN TABLE IF EXISTS catalog_sales_ext; CREATE FOREIGN TABLE catalog_sales_ext ( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint , cs_promo_sk bigint , cs_order_number bigint , cs_quantity bigint , cs_wholesale_cost decimal(7,2) , cs_list_price decimal(7,2) , cs_sales_price decimal(7,2) , cs_ext_discount_amt decimal(7,2) , cs_ext_sales_price decimal(7,2) , cs_ext_wholesale_cost decimal(7,2) , cs_ext_list_price decimal(7,2) , cs_ext_tax decimal(7,2) , cs_coupon_amt decimal(7,2) , cs_ext_ship_cost decimal(7,2) , cs_net_paid decimal(7,2) , cs_net_paid_inc_tax decimal(7,2) , cs_net_paid_inc_ship decimal(7,2) , cs_net_paid_inc_ship_tax decimal(7,2) , cs_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/catalog_sales* | gsfs://192.168.0.90:5003/catalog_sales*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with catalog_sales_err; DROP FOREIGN TABLE IF EXISTS store_sales_ext; CREATE FOREIGN TABLE store_sales_ext ( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint , ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint , ss_quantity bigint , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5002/store_sales* | gsfs://192.168.0.90:5003/store_sales*', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ) with store_sales_err; 执行以下SQL语句导入数据。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 INSERT INTO customer_address SELECT * FROM customer_address_ext; INSERT INTO customer_demographics SELECT * FROM customer_demographics_ext; INSERT INTO date_dim SELECT * FROM date_dim_ext; INSERT INTO warehouse SELECT * FROM warehouse_ext; INSERT INTO ship_mode SELECT * FROM ship_mode_ext; INSERT INTO time_dim SELECT * FROM time_dim_ext; INSERT INTO reason SELECT * FROM reason_ext; INSERT INTO income_band SELECT * FROM income_band_ext; INSERT INTO item SELECT * FROM item_ext; INSERT INTO store SELECT * FROM store_ext; INSERT INTO call_center SELECT * FROM call_center_ext; INSERT INTO customer SELECT * FROM customer_ext; INSERT INTO web_site SELECT * FROM web_site_ext; INSERT INTO household_demographics SELECT * FROM household_demographics_ext; INSERT INTO web_page SELECT * FROM web_page_ext; INSERT INTO promotion SELECT * FROM promotion_ext; INSERT INTO catalog_page SELECT * FROM catalog_page_ext; INSERT INTO inventory SELECT * FROM inventory_ext; INSERT INTO catalog_returns SELECT * FROM catalog_returns_ext; INSERT INTO web_returns SELECT * FROM web_returns_ext; INSERT INTO store_returns SELECT * FROM store_returns_ext; INSERT INTO web_sales SELECT * FROM web_sales_ext; INSERT INTO catalog_sales SELECT * FROM catalog_sales_ext; INSERT INTO store_sales SELECT * FROM store_sales_ext;
  • 表数据行数 表1 TPC-DS 序号 表名 行数 1 customer_address 6,000,000 2 customer_demographics 1,920,800 3 date_dim 73,049 4 warehouse 20 5 ship_mode 20 6 time_dim 86,400 7 reason 65 8 income_band 20 9 item 300,000 10 store 1,002 11 call_center 42 12 customer 12,000,000 13 web_site 54 14 household_demographics 7,200 15 web_page 3,000 16 promotion 1,500 17 catalog_page 30,000 18 inventory 783,000,000 19 catalog_returns 143,996,756 20 web_returns 71,997,522 21 store_returns 287,999,764 22 web_sales 720,000,376 23 catalog_sales 1,439,980,416 24 store_sales 2,879,987,999
  • 操作步骤 使用gsql连接DWS成功后,执行以下命令创建目标表(8张表)。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 DROP TABLE IF EXISTS region; CREATE TABLE region ( R_REGIONKEY INT NOT NULL, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152) ) with (orientation = column) distribute by replication; DROP TABLE IF EXISTS nation; CREATE TABLE nation ( N_NATIONKEY INT NOT NULL, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INT NOT NULL, N_COMMENT VARCHAR(152) ) with (orientation = column) distribute by replication; DROP TABLE IF EXISTS supplier; CREATE TABLE supplier ( S_SUPPKEY INT NOT NULL, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INT NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL ) with (orientation = column) distribute by hash(S_SUPPKEY); DROP TABLE IF EXISTS customer; CREATE TABLE customer ( C_CUSTKEY INT NOT NULL, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INT NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL ) with (orientation = column) distribute by hash(C_CUSTKEY); DROP TABLE IF EXISTS part; CREATE TABLE part ( P_PARTKEY INT NOT NULL, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INT NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL ) with (orientation = column) distribute by hash(P_PARTKEY); DROP TABLE IF EXISTS partsupp; CREATE TABLE partsupp ( PS_PARTKEY INT NOT NULL, PS_SUPPKEY INT NOT NULL, PS_AVAILQTY INT NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL ) with (orientation = column) distribute by hash(PS_PARTKEY); DROP TABLE IF EXISTS orders; CREATE TABLE orders ( O_ORDERKEY BIGINT NOT NULL, O_CUSTKEY INT NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INT NOT NULL, O_COMMENT VARCHAR(79) NOT NULL ) with (orientation = column) distribute by hash(O_ORDERKEY) PARTITION BY RANGE(O_ORDERDATE) ( PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00') ); DROP TABLE IF EXISTS lineitem; CREATE TABLE lineitem ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY INT NOT NULL, L_SUPPKEY INT NOT NULL, L_LINENUMBER INT NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL ) with (orientation = column) distribute by hash(L_ORDERKEY) PARTITION BY RANGE(L_SHIPDATE) ( PARTITION L_SHIPDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), PARTITION L_SHIPDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), PARTITION L_SHIPDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), PARTITION L_SHIPDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), PARTITION L_SHIPDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), PARTITION L_SHIPDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), PARTITION L_SHIPDATE_7 VALUES LESS THAN('1999-01-01 00:00:00') ); 执行以下命令创建GDS外表(8张表)。 以下每个外表的“gsfs://192.168.0.90:500x/xxx | gsfs://192.168.0.90:500x/xxx”中的IP地址和端口,请替换成安装和启动GDS中的对应的GDS的监听IP和端口。如启动两个GDS,则使用“|”区分。如果启动多个GDS,需要将所有GDS的监听IP和端口配置到外表中。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 DROP FOREIGN TABLE IF EXISTS region_load; CREATE FOREIGN TABLE region_load ( R_REGIONKEY INT, R_NAME CHAR(25), R_COMMENT VARCHAR(152) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/region* | gsfs://192.168.0.90:5001/region*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS nation_load; CREATE FOREIGN TABLE nation_load ( N_NATIONKEY INT, N_NAME CHAR(25), N_REGIONKEY INT, N_COMMENT VARCHAR(152) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/nation* | gsfs://192.168.0.90:5001/nation*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS supplier_load; CREATE FOREIGN TABLE supplier_load ( S_SUPPKEY INT, S_NAME CHAR(25), S_ADDRESS VARCHAR(40), S_NATIONKEY INT, S_PHONE CHAR(15), S_ACCTBAL DECIMAL(15,2), S_COMMENT VARCHAR(101) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/supplier* | gsfs://192.168.0.90:5001/supplier*', format 'text', delimiter '|', encoding 'utf8', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS customer_load; CREATE FOREIGN TABLE customer_load ( C_CUSTKEY INT, C_NAME VARCHAR(25), C_ADDRESS VARCHAR(40), C_NATIONKEY INT, C_PHONE CHAR(15), C_ACCTBAL DECIMAL(15,2), C_MKTSEGMENT CHAR(10), C_COMMENT VARCHAR(117) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/customer* | gsfs://192.168.0.90:5001/customer*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS part_load; CREATE FOREIGN TABLE part_load ( P_PARTKEY INT, P_NAME VARCHAR(55), P_MFGR CHAR(25), P_BRAND CHAR(10), P_TYPE VARCHAR(25), P_SIZE INT, P_CONTAINER CHAR(10), P_RETAILPRICE DECIMAL(15,2), P_COMMENT VARCHAR(23) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/part.* | gsfs://192.168.0.90:5001/part.*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS partsupp_load; CREATE FOREIGN TABLE partsupp_load ( PS_PARTKEY INT, PS_SUPPKEY INT, PS_AVAILQTY INT, PS_SUPPLYCOST DECIMAL(15,2), PS_COMMENT VARCHAR(199) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/partsupp* | gsfs://192.168.0.90:5001/partsupp*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS orders_load; CREATE FOREIGN TABLE orders_load ( O_ORDERKEY BIGINT, O_CUSTKEY INT, O_ORDERSTATUS CHAR(1), O_TOTALPRICE DECIMAL(15,2), O_ORDERDATE DATE, O_ORDERPRIORITY CHAR(15), O_CLERK CHAR(15), O_SHIPPRIORITY INT, O_COMMENT VARCHAR(79) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/orders* | gsfs://192.168.0.90:5001/orders*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); DROP FOREIGN TABLE IF EXISTS lineitem_load; CREATE FOREIGN TABLE lineitem_load ( L_ORDERKEY BIGINT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DECIMAL(15,2), L_EXTENDEDPRICE DECIMAL(15,2), L_DISCOUNT DECIMAL(15,2), L_TAX DECIMAL(15,2), L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE DATE, L_COMMITDATE DATE, L_RECEIPTDATE DATE, L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44) ) SERVER gsmpp_server OPTIONS(location 'gsfs://192.168.0.90:5000/lineitem* | gsfs://192.168.0.90:5001/lineitem*', format 'text', delimiter '|', encoding 'utf8', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true', mode 'Normal' ); 执行以下命令导入数据。 1 2 3 4 5 6 7 8 INSERT INTO region SELECT * FROM region_load; INSERT INTO nation SELECT * FROM nation_load; INSERT INTO supplier SELECT * FROM supplier_load; INSERT INTO customer SELECT * FROM customer_load; INSERT INTO part SELECT * FROM part_load; INSERT INTO partsupp SELECT * FROM partsupp_load; INSERT INTO orders SELECT * FROM orders_load; INSERT INTO lineitem SELECT * FROM lineitem_load;
  • 安装和启动GDS 登录GaussDB(DWS)管理控制台。 在左侧导航栏中,单击“连接客户端”。 在 “gsql 命令行客户端”的下拉列表中,选择对应版本的GaussDB(DWS)客户端。 请根据集群版本和安装客户端的操作系统,选择对应版本。 客户端CPU架构要和集群一致,如果集群是X86规格,则也应该选择X86客户端。 单击“下载”。 将GDS工具包上传至ECS的/opt目录中,本例以上传Euler Kunpeng版本的工具包为例。 在工具包所在目录下,解压工具包。 1 2 cd /opt/ unzip dws_client_8.1.x_euler_kunpeng_x64.zip 创建用户gds_user及其所属的用户组gdsgrp。此用户用于启动GDS,且需要拥有读取数据源文件目录的权限。 1 2 groupadd gdsgrp useradd -g gdsgrp gds_user 修改工具包以及数据源文件目录属主为创建的用户gds_user及其所属的用户组gdsgrp。 1 2 3 chown -R gds_user:gdsgrp /opt/ chown -R gds_user:gdsgrp /data1 chown -R gds_user:gdsgrp /data2 切换到gds_user用户。 1 su - gds_user 执行环境依赖脚本(仅8.1.x版本适用)。 1 2 cd /opt/gds/bin source gds_env 启动GDS。 1 2 3 4 /opt/gds/bin/gds -d /data1/script/tpch-kit/tpch1000X -p 192.168.0.90:5000 -H 192.168.0.0/24 -l /opt/gds/gds01_log.txt -D #TPC-H使用 /opt/gds/bin/gds -d /data2/script/tpch-kit/tpch1000X -p 192.168.0.90:5001 -H 192.168.0.0/24 -l /opt/gds/gds02_log.txt -D #TPC-H使用 /opt/gds/bin/gds -d /data1/script/tpcds-kit/tpcds1000X/ -p 192.168.0.90:5002 -H 192.168.0.0/24 -l /opt/gds/gds03_log.txt -D #TPC-DS使用 /opt/gds/bin/gds -d /data2/script/tpcds-kit/tpcds1000X/ -p 192.168.0.90:5003 -H 192.168.0.0/24 -l /opt/gds/gds04_log.txt -D #TPC-DS使用 命令中的斜体部分请根据实际填写,如果数据分片存放至多个数据盘目录,需要启动对应目录数量的GDS。 如果TPC-H和TPC-DS数据同时测试,需要启动以上4个GDS,如果只测试TPC-DS或TPC-H数据,请根据后面的“#xxx”备注启动对应的GDS服务即可。 -d dir:保存有待导入数据的数据文件所在目录。 -p ip:port:GDS监听IP和监听端口。IP替换为ECS的内网IP,确保GaussDB(DWS)能通过此IP与GDS的通讯;端口对于TPC-H取5000、5001,对于TPC-DS取5002、5003。 -H address_string:允许哪些主机连接和使用GDS服务。参数需为CIDR格式。此地址配置成GaussDB(DWS)的集群内网网段(即GDS所在的ECS与GaussDB(DWS)在同一个VPC下,以内网通讯即可),例如192.168.0.0/24。 -l log_file:存放GDS的日志文件路径及文件名。 -D:后台运行GDS。仅支持Linux操作系统下使用。
  • 创建 数据仓库 GaussDB(DWS) 参见“创建集群”章节创建GaussDB(DWS)数据仓库。创建成功后,记录集群的内网IP。 为确保ECS与GaussDB(DWS)网络互通,GaussDB(DWS)数据仓库需要与ECS在同一个区域,同一个虚拟私有云和子网下。 表1 DWS规格 参数项 参数取值 区域 华北-北京4 可用区 可用区1 产品类型 标准数仓 节点规格 8xlarge | 32 vCPUs | 256GB 每节点可用存储 500GB 节点数 3 父主题: 创建弹性 云服务器ECS 和数据仓库GaussDB(DWS)
  • 创建弹性云服务器ECS 参考《弹性云服务器用户指南》创建弹性云服务器,创建的规格可参见下表。 由于TPC-DS、TPC-H数据集占用空间较大,以TPC-DS 1000X和TPC-H 1000X为例,分别占用930GB和1100GB。请创建弹性云服务器时,根据需求添加数据盘,举例如下: 单测TPC-DS或者TPC-H时:挂载2块超高IO 600GB数据盘。 同时测TPC-DS和TPC-H时:挂载2块超高IO 1200GB数据盘。 表1 ECS规格 参数项 参数取值 计费模式 按需计费 区域 华北-北京4 可用区 可用区1 CPU架构 鲲鹏计算 规格 鲲鹏通用计算增强型 | kc1.8xlarge.2 32vCPUs|64 GiB 镜像 EulerOS 2.8 64bit with ARM(40GB) 数据盘 系统盘通用型SSD 40GB,数据盘要求如下: 单测TPC-DS或者TPC-H时:挂载2块超高IO 600GB数据盘。 同时测TPC-DS和TPC-H时:挂载2块超高IO 1200GB数据盘。 父主题: 创建弹性云服务器ECS和数据仓库GaussDB(DWS)
  • 测试结果 如下为scale=1000的TPC-DS的性能测试结果,查询执行时间以秒(s)为单位。 表2 TPC-DS 1000X测试结果 编号 TPC-DS查询 测试结果(s) 1 Q1 1.46 2 Q2 9.33 3 Q3 2.40 4 Q4 142.24 5 Q5 6.36 6 Q6 0.82 7 Q7 3.20 8 Q8 1.02 9 Q9 10.92 10 Q10 1.68 11 Q11 78.92 12 Q12 0.53 13 Q13 3.95 14 Q14 113.09 15 Q15 2.99 16 Q16 8.08 17 Q17 6.94 18 Q18 3.58 19 Q19 1.20 20 Q20 0.51 21 Q21 0.51 22 Q22 11.37 23 Q23 188.13 24 Q24 14.51 25 Q25 6.90 26 Q26 1.30 27 Q27 3.42 28 Q28 18.44 29 Q29 5.27 30 Q30 1.09 31 Q31 7.13 32 Q32 2.21 33 Q33 2.35 34 Q34 4.00 35 Q35 3.60 36 Q36 10.27 37 Q37 0.56 38 Q38 52.97 39 Q39 7.00 40 Q40 1.29 41 Q41 0.08 42 Q42 0.66 43 Q43 2.99 44 Q44 3.05 45 Q45 1.38 46 Q46 5.86 47 Q47 10.98 48 Q48 2.32 49 Q49 3.54 50 Q50 10.30 51 Q51 13.90 52 Q52 0.63 53 Q53 1.68 54 Q54 6.27 55 Q55 0.57 56 Q56 1.60 57 Q57 4.76 58 Q58 1.89 59 Q59 20.85 60 Q60 3.21 61 Q61 1.89 62 Q62 1.83 63 Q63 2.10 64 Q64 35.18 65 Q65 14.18 66 Q66 2.71 67 Q67 231.01 68 Q68 5.41 69 Q69 1.48 70 Q70 10.46 71 Q71 5.34 72 Q72 7.81 73 Q73 2.19 74 Q74 49.95 75 Q75 19.64 76 Q76 6.32 77 Q77 2.55 78 Q78 136.59 79 Q79 8.08 80 Q80 4.90 81 Q81 1.77 82 Q82 1.32 83 Q83 0.46 84 Q84 0.52 85 Q85 3.43 86 Q86 5.01 87 Q87 53.19 88 Q88 8.94 89 Q89 2.23 90 Q90 0.77 91 Q91 0.42 92 Q92 1.06 93 Q93 13.28 94 Q94 4.90 95 Q95 41.46 96 Q96 2.16 97 Q97 12.01 98 Q98 2.04 99 Q99 3.35 100 总时长(s) 1545.93
  • 测试结果 如下为scale=1000的TPC-H的性能测试结果,查询执行时间以秒(s)为单位。 表2 TPC-H 1000X测试结果 编号 TPC-H查询 测试结果(s) 1 Q1 22.15 2 Q2 2.65 3 Q3 28.29 4 Q4 49.07 5 Q5 44.45 6 Q6 1.24 7 Q7 33.26 8 Q8 32.10 9 Q9 112.67 10 Q10 20.63 11 Q11 6.49 12 Q12 13.73 13 Q13 20.61 14 Q14 3.10 15 Q15 3.45 16 Q16 7.77 17 Q17 16.97 18 Q18 81.23 19 Q19 16.74 20 Q20 16.99 21 Q21 53.11 22 Q22 10.20 23 总时长(s) 596.90
  • CDM 迁移原理 用户使用CDM服务时,CDM管理系统在用户VPC中发放全托管的CDM实例。此实例仅提供控制台和Rest API访问权限,用户无法通过其他接口(如SSH)访问实例。这种方式保证了CDM用户间的隔离,避免数据泄漏,同时保证VPC内不同华为云服务间数据迁移时的传输安全。用户还可以使用VPN网络将本地数据中心的数据迁移到华为云服务,具有高度的安全性。 CDM数据迁移以抽取-写入模式进行。CDM首先从源端抽取数据然后将数据写入到目的端,数据访问操作均由CDM主动发起,对于数据源(如RDS数据源)支持SSL时,会使用SSL加密传输。迁移过程要求用户提供源端和目的端数据源的用户名和密码,这些信息将存储在CDM实例的数据库中。保护这些信息对于CDM安全至关重要。 图1 CDM迁移原理
  • CDM简介 云数据迁移 (Cloud Data Migration,简称CDM)提供同构/异构数据源之间批量数据迁移服务,帮助客户实现数据自由流动。支持多种常用数据源,如客户自建或公有云上的文件系统,关系数据库,数据仓库,NoSQL数据库,大数据云服务,对象存储等数据源。 CDM适用于以下场景: 数据上云:使用华为公有云服务时,用户可以将其历史数据或增量数据从私有云/本地数据中心/第三方公有云迁移到华为云。 云服务间数据交换:用户可以在华为云的大数据服务、数据库服务、 对象存储服务 之间相互迁移数据。例如,可以将由 MapReduce服务 (MapReduce Service,简称 MRS )处理的数据导入到数据仓库服务(Data Warehouse Service,简称DWS),进行交互式分析和报告统计收集。 云上数据回迁到本地:用户在使用公有云计算资源对海量数据进行处理后,将结果数据回流到本地业务系统,主要是各种关系型数据库和文件系统。
共100000条