广义倒排索引

广义倒排索引或 GIN索引存储从容器列(例如JSONB文档)中的值到包含该值的行的映射。 GIN索引通常用于文档检索系统。

Hubble在 GIN索引中存储以下数据类型的内容：

JSONB
数组

GIN索引如何工作

标准索引适用于基于排序数据前缀的搜索。但是，如果不进行全表扫描，就无法查询像JSONB数组这样的数据类型。JSONB数组特别是需要以比标准索引提供的更详细的方式进行索引，这就是GIN索引被证明有用的地方。

GIN索引过滤可标记数据的组成部分。JSONB数据类型建立在两个可以标记化的结构上：

对象：键值对的集合，其中每个键值对都是一个标记。
数组：有序的值列表，其中数组中的每个值都是一个标记。

例如，JSONB在列中取以下值

{
  "firstName": "mike",
  "lastName": "jackson",
  "number": 10,
  "address": {
    "state": "USA",
    "postCode": "10000"
  },
  "sports": [
    "byd",
    "aodi"
  ]
}

该对象的GIN索引每个组件都有一个条目，将其映射回原始对象：


"firstName": "mike"
"lastName": "jackson"
"number": 10
"address": "state": "USA"
"address": "postCode": "10000"
"sports" : "byd"
"sports" : "aodi"

创建

可以使用GIN索引来提高使用JSONB或ARRAY列的查询的性能。

可以使用以下语法指定jsonb_ops或array_ops操作类（分别用于JSONB和ARRAY列）：

CREATE INDEX {optional name} ON {table} USING GIN ({column} {opclass});

创建表时，使用语法CREATE INVERTED INDEX：

CREATE INVERTED INDEX {optional name} ON {table} ({column});

选择

在大多数情况下，Hubble选择它计算出的索引将扫描最少的行(即最快的)。Hubble将使用多个索引的情况包括某些谓词的查询。可以使用EXPLAIN查询语句来查看正在使用哪个索引。

存储

Hubble将索引直接存储在键值存储中。

表现

索引产生了一种权衡：它们极大地提高了查询速度，但略微减慢了写入速度（因为必须复制和排序新值）。

比较

本节介绍如何对JSONB和ARRAY列进行比较。

JSONB

列上的GIN索引支持以下比较运算符：

包含在：<@
包含：@>
等于：=。要使用=，还必须与->一起使用。例如：

SELECT * FROM s WHERE a ->'foo' = '"1"';

示例如下：

1.创建一个包含计算列的表：

create table td (
    id int,
    data json,
    t int as ((data->>'t')::int) stored
    );

2.在计算列上创建索引：

create index td_idx on td (t);

3.执行查询：

select * from td where t > 1;

数组

列上的GIN索引支持以下比较运算符：

包含在：<@ 包含：@>

部分GIN索引

就像使用非容器数据类型的部分索引一样，可以通过包含一个子句的方式来创建部GIN索引：

create table  t1 (
  id INT,
  da JSONB,
  INVERTED INDEX idx_da(da) WHERE id > 1
);

多列GIN索引

创建具有多个列的GIN索引：

create table cust_info (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  c_type STRING,
  c_profile JSONB,
  INVERTED INDEX (c_type, c_profile)
);

示例

在JSONB列上创建一个带有GIN索引的表

create table cust_info (
    profile_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    login_time TIMESTAMP DEFAULT now(),
    c_profile JSONB,
    INVERTED INDEX c_ids (c_profile)
  );
  
insert into cust_info (c_profile) values
    ('{"first_name": "Lola", "last_name": "Dog", "location": "NYC", "online" : true, "friends" : 547}'),
    ('{"first_name": "Ernie", "status": "Looking for treats", "location" : "Brooklyn"}'),
    ('{"first_name": "Carl", "last_name": "Kimball", "location": "NYC", "breed": "Boston Terrier"}'
  );

现在，运行对JSONB列进行过滤的查询：

select  * from cust_info where c_profile @> '{"location":"NYC"}';

               profile_id              |         login_time         |                                           c_profile
---------------------------------------+----------------------------+------------------------------------------------------------------------------------------------
  572e728c-55f2-4645-9d52-574dae4f5745 | 2022-12-23 15:12:39.329944 | {"first_name": "Lola", "friends": 547, "last_name": "Dog", "location": "NYC", "online": true}
  e9ef1f94-2d2d-4311-b281-d86520f5deeb | 2022-12-23 15:12:39.329944 | {"breed": "Boston Terrier", "first_name": "Carl", "last_name": "Kimball", "location": "NYC"}

查看执行计划

explain select  * from cust_info where c_profile @> '{"location":"NYC"}';

              info
--------------------------------
  distribution: local
  vectorized: false

  • index join
  │ table: cust_info@primary
  │
  └── • scan
        missing stats
        table: cust_info@c_ids
        spans: 1 span

将GIN索引添加到具有数组列的表

在这个例子中，先创建一个包含ARRAY列的表，然后再添加GIN索引:

create table stu (
    id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    bins INT ARRAY
  );
  
insert into stu (bins) values
    (ARRAY[10,20,50]),
    (ARRAY[20,30,60]),
    (ARRAY[30,40,70]
  );

未添加索引前，查看执行计划:

explain select * from stu where bins @> ARRAY[10];

             info
-------------------------------
  distribution: full
  vectorized: false

  • filter
  │ filter: bins @> ARRAY[10]
  │
  └── • scan
        missing stats
        table: stu@primary
        spans: FULL SCAN

向表中添加一个 GIN索引并运行一个过滤查询:

create inverted index stu_bins on stu (bins);

select * from stu where bins @> ARRAY[10];

                   id                  |    bins
---------------------------------------+-------------
  d864a089-d310-408e-bcca-19084333f771 | {10,20,50}

再次查看执行计划:

 explain select * from stu where bins @> ARRAY[10];

            info
-----------------------------
  distribution: local
  vectorized: false

  • index join
  │ table: stu@primary
  │
  └── • scan
        missing stats
        table: stu@stu_bins
        spans: 1 span

在JSONB列上创建具有 GIN索引的表

创建一个部分GIN索引：

create inverted index idx_online on cust_info(c_profile) where c_profile -> 'online' = 'true';

select * from cust_info where c_profile -> 'online' = 'true';

               profile_id              |         login_time         |                                           c_profile
---------------------------------------+----------------------------+------------------------------------------------------------------------------------------------
  572e728c-55f2-4645-9d52-574dae4f5745 | 2022-12-23 15:12:39.329944 | {"first_name": "Lola", "friends": 547, "last_name": "Dog", "location": "NYC", "online": true}

查看执行计划：

explain select * from cust_info where c_profile -> 'online' = 'true';

              info
--------------------------------
  distribution: local
  vectorized: false

  • index join
  │ table: cust_info@primary
  │
  └── • scan
        missing stats
        table: cust_info@c_ids
        spans: 1 span

下载PDF

本页导航

GIN索引如何工作
部分GIN索引
多列GIN索引
示例