为了获得良好的性能,需要通过几个镜头来查看您是如何访问数据库的:
要获得良好的SQL语句性能,请遵循以下规则:
WHERE
语句应该在子句中的索引列上使用。您希望避免全表扫描对性能的影响。SELECT * FROM users WHERE name = 'Cheyenne Smith';
id | city | name | address | credit_card
---------------------------------------+-----------+----------------+------------------------------+--------------
326d3a95-fe03-4600-8000-00000003c1d0 | boston | Cheyenne Smith | 84252 Bradley Coves Suite 38 | 2369363335
c6b45c93-7ecd-4800-8000-0000000ecdfd | amsterdam | Cheyenne Smith | 29149 Jane Lake | 6072991876
5edbdde1-c806-4400-8000-00000007114a | seattle | Cheyenne Smith | 90016 Anthony Groves | 3618456173
60edde95-4c2d-4000-8000-0000000738c7 | seattle | Cheyenne Smith | 70310 Knight Roads Suite 36 | 9909070365
00024e8e-d94c-4710-8000-00000000002c | new york | Cheyenne Smith | 8550 Kelsey Flats | 4374468739
0c777227-64c8-4780-8000-00000000edc8 | new york | Cheyenne Smith | 47925 Cox Ways | 7070681549
(1 row)
Time: 981ms total (execution 981ms / network 0ms)
EXPLAIN
:EXPLAIN SELECT * FROM users WHERE name = 'Cheyenne Smith';
info
-----------------------------------------------------------------------------------------
distribution: full
vectorized: true
• filter
│ estimated row count: 3
│ filter: name = 'Cheyenne Smith'
│
└── • scan
estimated row count: 1,259,634 (100% of the table; stats collected 3 minutes ago)
table: users@primary
spans: FULL SCAN
情况一:二级索引过滤
CREATE INDEX on users (name);
查询将更快的返回
SELECT * FROM users WHERE name = 'Cheyenne Smith';
id | city | name | address | credit_card
---------------------------------------+-----------+----------------+------------------------------+--------------
c6b45c93-7ecd-4800-8000-0000000ecdfd | amsterdam | Cheyenne Smith | 29149 Jane Lake | 6072991876
326d3a95-fe03-4600-8000-00000003c1d0 | boston | Cheyenne Smith | 84252 Bradley Coves Suite 38 | 2369363335
00024e8e-d94c-4710-8000-00000000002c | new york | Cheyenne Smith | 8550 Kelsey Flats | 4374468739
0c777227-64c8-4780-8000-00000000edc8 | new york | Cheyenne Smith | 47925 Cox Ways | 7070681549
5edbdde1-c806-4400-8000-00000007114a | seattle | Cheyenne Smith | 90016 Anthony Groves | 3618456173
60edde95-4c2d-4000-8000-0000000738c7 | seattle | Cheyenne Smith | 70310 Knight Roads Suite 36 | 9909070365
(6 rows)
Time: 24ms total (execution 3ms / network 0ms)
users@users_name_idx
)开始。因为是按 排序的name,所以查询可以直接跳转到相关值(/'Cheyenne Smith' - /'Cheyenne Smith'
)。但是,查询需要返回不在二级索引中的值,因此Hubble抓取与该值一起存储的主键(city/id)name(主键始终与二级索引中的条目一起存储),跳转到主索引中的该值,然后返回整行。情况二:按存储列的二级索引进行过滤
当有一个按特定列过滤但检索表总列的子集的查询时,可以通过将这些附加列存储在二级索引中来提高性能,以防止查询也需要扫描主索引。例如:假设您经常检索用户的姓名和信用卡号:
SELECT name,credit_card FROM users WHERE name = 'Cheyenne Smith';
name | credit_card
-----------------+--------------
Cheyenne Smith | 6072991876
Cheyenne Smith | 2369363335
Cheyenne Smith | 4374468739
Cheyenne Smith | 7070681549
Cheyenne Smith | 3618456173
Cheyenne Smith | 9909070365
(1 row)
Time: 14ms total (execution 4ms / network 0ms)
EXPLAIN SELECT name, credit_card FROM users WHERE name = 'Cheyenne Smith';
info
-------------------------------------------------------------------------------------
distribution: local
vectorized: true
• index join
│ estimated row count: 3
│ table: users@primary
│
└── • scan
estimated row count: 3 (<0.01% of the table; stats collected 2 minutes ago)
table: users@users_name_idx
spans: [/'Cheyenne Smith' - /'Cheyenne Smith']
DROP INDEX users_name_idx;
CREATE INDEX ON users (name) STORING (credit_card);
EXPLAIN SELECT name, credit_card FROM users WHERE name = 'Cheyenne Smith';
info
---------------------------------------------------------------------------------
distribution: local
vectorized: true
scan
estimated row count: 3 (<0.01% of the table; stats collected 27 seconds ago)
table: users@users_name_idx
spans: [/'Cheyenne Smith' - /'Cheyenne Smith']
SELECT name, credit_card FROM users WHERE name = 'Cheyenne Smith';
name | credit_card
-----------------+--------------
Cheyenne Smith | 6072991876
Cheyenne Smith | 2369363335
Cheyenne Smith | 4374468739
Cheyenne Smith | 7070681549
Cheyenne Smith | 3618456173
Cheyenne Smith | 9909070365
(6 rows)
Time: 2ms total (execution 2ms / network 0ms)
谓词下推
将查询语句中的过滤表达式计算尽可能下推到距离数据源最近的地方,进而显著地减少数据传输或计算的开销。
create table a (id int primary key, b int);
explain select * from a where b < 1;
Min/Max函数消除
当一个 SQL满足以下条件时,就会应用这个规则:
max
或者min
函数。group by
语句。示例:
select max(n) from b;
这时max/min
消除优化规则会将其重写为:
select max(n) from (select n from b where a is not null order by n desc limit 1) b;
这个SQL语句在 n列存在索引时,能够利用索引只扫描一行数据来得到最大或者最小值,从而避免全表的扫描。