数据仓库服务 GAUSSDB(DWS)-搜索表

时间:2024-12-06 15:12:40

搜索表

在不使用索引的情况下也可以进行全文检索。

  • 一个简单查询:将body字段中包含science的每一行打印出来。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    DROP SCHEMA IF EXISTS tsearch CASCADE;
    
    CREATE SCHEMA tsearch;
    
    CREATE TABLE tsearch.pgweb(id int, body text, title text, last_mod_date date);
    
    INSERT INTO tsearch.pgweb VALUES(1, 'Philology is the study of words, especially the history and development of the words in a particular language or group of languages.', 'Philology', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(2, 'Mathematics is the science that deals with the logic of shape, quantity and arrangement.', 'Mathematics', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(3, 'Computer science is the study of processes that interact with data and that can be represented as data in the form of programs.', 'Computer science', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(4, 'Chemistry is the scientific discipline involved with elements and compounds composed of atoms, molecules and ions.', 'Chemistry', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(5, 'Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets.', 'Geography', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(6, 'History is a subject studied in schools, colleges, and universities that deals with events that have happened in the past.', 'History', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(7, 'Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease.', 'Medical science', '2010-1-1');
    
    INSERT INTO tsearch.pgweb VALUES(8, 'Physics is one of the most fundamental scientific disciplines, and its main goal is to understand how the universe behaves.', 'Physics', '2010-1-1');
    
    
    SELECT id, body, title FROM tsearch.pgweb WHERE to_tsvector('english', body) @@ to_tsquery('english', 'science');
     id |                                                          body                                                           |  title  
    ----+-------------------------------------------------------------------------------------------------------------------------+---------
      
     2 | Mathematics is the science that deals with the logic of shape, quantity and arrangement.                                        | Mathematics
     3 | Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. | Computer science
     5 | Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets.   | Geography
     7 | Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease.           | Medical science
    (4 rows)
    

    像science这样的相关词也会被找到,因为这些词都被处理成了相同标准的词条。

    上面的查询指定english配置来解析和规范化字符串。也可以省略此配置,通过default_text_search_config进行配置设置:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    SHOW default_text_search_config;
     default_text_search_config 
    ----------------------------
     pg_catalog.english
    (1 row)
    
    SELECT id, body, title FROM tsearch.pgweb WHERE to_tsvector(body) @@ to_tsquery('science');
     id |                                                          body                                                           |  title  
    ----+-------------------------------------------------------------------------------------------------------------------------+---------
     
     2 | Mathematics is the science that deals with the logic of shape, quantity and arrangement.                                        | Mathematics
     3 | Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. | Computer science
     5 | Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets.   | Geography
     7 | Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease.           | Medical science
    
    (4 rows)
    
  • 一个复杂查询:检索出在title或者body字段中包含treatment和science的最近10篇文档:
    1
    2
    3
    4
    5
    6
    7
    SELECT title FROM tsearch.pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('treatment & science') ORDER BY last_mod_date DESC LIMIT 10;
     title  
    --------
     
    Medical science
    
    (1 rows)
    

    为了清晰,举例中没有调用coalesce函数在两个字段中查找包含NULL的行。

    以上例子均在没有索引的情况下进行查询。对于大多数应用程序来说,这个方法很慢。因此除了偶尔的特定搜索,文本搜索在实际使用中通常需要创建索引。

support.huaweicloud.com/sqlreference-910-dws/dws_06_0088.html