云服务器内容精选

  • 示例 收集表fruit的统计信息: ANALYZE fruit; 统计catalog hive、schema default下的表存储: ANALYZE hive.default.orders; 从hive分区表中统计分区'2020-07-17' , '2020-07-18'信息: ANALYZE hive.web.page_views WITH (partitions = ARRAY[ARRAY['2020-07-17','US'], ARRAY['2020-07-18','US']]);
  • 示例 CREATE SCHEMA web; DESCRIBE SCHEMA web; Describe Schema ------------------------------------------------------------------------- web hdfs://hacluster/user/hive/warehouse/web.db admintest USER (1 row)
  • 示例 --PREPARE my_select1 FROM SELECT * FROM fruit; DESCRIBE OUTPUT my_select1; --PREPARE my_select2 FROM SELECT count(*) as my_count, 1+2 FROM fruit; DESCRIBE OUTPUT my_select2; --PREPARE my_create FROM CREATE TABLE foo AS SELECT * FROM fruit; DESCRIBE OUTPUT my_create;
  • 描述 显示一条语句的逻辑的或者分布式的执行计划,也可以用于校验一条SQL语句,或者是分析IO。 参数TYPE DISTRIBUTED用于显示分片后的计划(fragmented plan)。每一个fragment都会被一个或者多个节点执行。Fragments separation表示数据在两个节点之间进行交换。Fragment type表示一个fragment如何被执行以及数据在不同fragment之间怎样分布。 SINGLE Fragment会在单个节点上执行。 HASH Fragment会在固定数量的节点上执行,输入数据通过哈希函数进行分布。 ROUND_ROBIN Fragment会在固定数量的节点上执行,片段在固定数量的节点上执行,输入数据以轮循方式进行分布。 BROADCAST Fragment会在固定数量的节点上执行,输入数据被广播到所有的节点。 SOURCE Fragment在访问输入分段的节点上执行。
  • 示例 LOG ICAL: CREATE TABLE testTable (regionkey int, name varchar); EXPLAIN SELECT regionkey, count(*) FROM testTable GROUP BY 1; Query Plan ------------------------------------------------------------------------------------------------------------------------------------- Output[regionkey, _col1] │ Layout: [regionkey:integer, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ _col1 := count └─ RemoteExchange[GATHER] │ Layout: [regionkey:integer, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Project[] │ Layout: [regionkey:integer, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Aggregate(FINAL)[regionkey][$hashvalue] │ Layout: [regionkey:integer, $hashvalue:bigint, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ count := count("count_8") └─ LocalExchange[HASH][$hashvalue] ("regionkey") │ Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ RemoteExchange[REPARTITION][$hashvalue_9] │ Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Aggregate(PARTIAL)[regionkey][$hashvalue_10] │ Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint] │ count_8 := count(*) └─ ScanProject[table = hive:default:testtable] Layout: [regionkey:integer, $hashvalue_10:bigint] Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0)) regionkey := regionkey:int:0:REGULAR DISTRIBUTED: EXPLAIN (type DISTRIBUTED) SELECT regionkey, count(*) FROM testTable GROUP BY 1; Query Plan ----------------------------------------------------------------------------------------------------------------------- Fragment 0 [SINGLE] Output layout: [regionkey, count] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION Output[regionkey, _col1] │ Layout: [regionkey:integer, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ _col1 := count └─ RemoteSource[1] Layout: [regionkey:integer, count:bigint] Fragment 1 [HASH] Output layout: [regionkey, count] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION Project[] │ Layout: [regionkey:integer, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Aggregate(FINAL)[regionkey][$hashvalue] │ Layout: [regionkey:integer, $hashvalue:bigint, count:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ count := count("count_8") └─ LocalExchange[HASH][$hashvalue] ("regionkey") │ Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ RemoteSource[2] Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint] Fragment 2 [SOURCE] Output layout: [regionkey, count_8, $hashvalue_10] Output partitioning: HASH [regionkey][$hashvalue_10] Stage Execution Strategy: UNGROUPED_EXECUTION Aggregate(PARTIAL)[regionkey][$hashvalue_10] │ Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint] │ count_8 := count(*) └─ ScanProject[table = hive:default:testtable, grouped = false] Layout: [regionkey:integer, $hashvalue_10:bigint] Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0)) regionkey := regionkey:int:0:REGULAR VALIDATE: EXPLAIN (TYPE VALIDATE) SELECT id, count(*) FROM testTable GROUP BY 1; Valid ------- true IO: EXPLAIN (TYPE IO, FORMAT JSON) SELECT regionkey , count(*) FROM testTable GROUP BY 1; Query Plan --------------------------------- { "inputTableColumnInfos" : [ { "table" : { "catalog" : "hive", "schemaTable" : { "schema" : "default", "table" : "testtable" } }, "columnConstraints" : [ ] } ] }