Postgres优化/替换DISTINCT

时间:2017-07-17 05:00:59

标签: postgresql

尝试选择最多“follow_by”加入的用户按“tag”过滤。两个表都有数百万条记录。使用distinct来仅选择唯一用户。

select distinct u.*
from users u join posts p 
on u.id=p.user_id 
where p.tags @> ARRAY['love'] 
order by u.followed_by desc nulls last limit 21

它运行超过16秒,似乎是因为'distinct'导致Seq Scan超过600万用户。这是解释分析

Limit  (cost=15509958.30..15509959.09 rows=21 width=292) (actual time=16882.861..16883.753 rows=21 loops=1)
  ->  Unique  (cost=15509958.30..15595560.30 rows=2282720 width=292) (actual time=16882.859..16883.749 rows=21 loops=1)
        ->  Sort  (cost=15509958.30..15515665.10 rows=2282720 width=292) (actual time=16882.857..16883.424 rows=525 loops=1)
              Sort Key: u.followed_by DESC NULLS LAST, u.id, u.username, u.fullna
              Sort Method: external merge  Disk: 583064kBme, u.follows, u
              ->  Gather  (cost=1000.57..14956785.06 rows=2282720 width=292) (actual time=0.377..11506.001 rows=1680890 loops=1).media, u.profile_pic_url_hd, u.is_private, u.is_verified, u.biography, u.external_url, u.updated, u.location_id, u.final_post
                    Workers Planned: 9
                    Workers Launched: 9
                    ->  Nested Loop  (cost=0.57..14727513.06 rows=253636 width=292) (actual time=1.013..12031.634 rows=168089 loops=10)
                          ->  Parallel Seq Scan on posts p  (cost=0.00..13187797.79 rows=253636 width=8) (actual time=0.940..10872.630 rows=168089 loops=10)
                                Filter: (tags @> '{love}'::text[])
                                Rows Removed by Filter: 6251355
                          ->  Index Scan using user_pk on users u  (cost=0.57..6.06 rows=1 width=292) (actual time=0.006..0.006 rows=1 loops=1680890)
                                Index Cond: (id = p.user_id)
Planning time: 1.276 ms
Execution time: 16964.271 ms

非常感谢如何快速做到这一点。

更新

感谢@a_horse_with_no_name,对于“love”标签,它变得非常快

Limit  (cost=1.14..4293986.91 rows=21 width=292) (actual time=1.735..31.613 rows=21 loops=1)
  ->  Nested Loop Semi Join  (cost=1.14..10959887484.70 rows=53600 width=292) (actual time=1.733..31.607 rows=21 loops=1)
        ->  Index Scan using idx_followed_by on users u  (cost=0.57..322693786.19 rows=232404560 width=292) (actual time=0.011..0.103 rows=32 loops=1)
        ->  Index Scan using fki_user_fk1 on posts p  (cost=0.57..1943.85 rows=43 width=8) (actual time=0.983..0.983 rows=1 loops=32)
              Index Cond: (user_id = u.id)
              Filter: (tags @> '{love}'::text[])
              Rows Removed by Filter: 1699
Planning time: 1.322 ms
Execution time: 31.656 ms

然而对于像“漂亮”这样的其他标签来说它更好,但仍然有点慢。它还需要不同的执行路径

Limit  (cost=3893365.84..3893365.89 rows=21 width=292) (actual time=2813.876..2813.892 rows=21 loops=1)
  ->  Sort  (cost=3893365.84..3893499.84 rows=53600 width=292) (actual time=2813.874..2813.887 rows=21 loops=1)
        Sort Key: u.followed_by DESC NULLS LAST
        Sort Method: top-N heapsort  Memory: 34kB
        ->  Nested Loop  (cost=3437011.27..3891920.70 rows=53600 width=292) (actual time=1130.847..2779.928 rows=35230 loops=1)
              ->  HashAggregate  (cost=3437010.70..3437546.70 rows=53600 width=8) (actual time=1130.809..1148.209 rows=35230 loops=1)
                    Group Key: p.user_id
                    ->  Bitmap Heap Scan on posts p  (cost=10484.20..3434173.21 rows=1134993 width=8) (actual time=268.602..972.390 rows=814919 loops=1)
                          Recheck Cond: (tags @> '{beautiful}'::text[])
                          Heap Blocks: exact=347002
                          ->  Bitmap Index Scan on idx_tags  (cost=0.00..10200.45 rows=1134993 width=0) (actual time=168.453..168.453 rows=814919 loops=1)
                                Index Cond: (tags @> '{beautiful}'::text[])
              ->  Index Scan using user_pk on users u  (cost=0.57..8.47 rows=1 width=292) (actual time=0.045..0.046 rows=1 loops=35230)
                    Index Cond: (id = p.user_id)
Planning time: 1.388 ms
Execution time: 2814.132 ms

我确实已经有'标签'的杜松子酒索引

1 个答案:

答案 0 :(得分:1)

这应该更快:

Unknown class _TtC14MyApp11FSPagerView in Interface Builder file.
Unknown class _TtC14MyApp13FSPageControl in Interface Builder file.
*** Terminating app due to uncaught exception 'NSUnknownKeyException', reason: '[<UIView 0x7fce69f03010> setValue:forUndefinedKey:]: this class is not key value coding-compliant for the key dataSource.'

如果只有少数(<10%)的帖子包含该标记,那么select * from users u where exists (select * from posts p where u.id=p.user_id and p.tags @> ARRAY['love']) order by u.followed_by desc nulls last limit 21; 上的索引也会有所帮助:

posts.tags