PostgreSQL查询加入视图的性能很差

时间:2017-11-13 08:35:56

标签: postgresql

我正在尝试创建一个API,用于返回用户拥有的文件/文件夹列表,类似于Google云端硬盘。有关用户文件/文件夹/其他的信息存储在PostgreSQL表中。

我创建了一个名为gc_drive的视图,它从这些表中获取文件列表。是的,我将它们称为表,因为文件,文件夹和虚拟文件有不同的表,因此我使用视图将所有这些记录连接在一起,因此它们实际上显示为文件系统,即使它们不是100%的文件系统。

视图性能看起来很好,我已经索引了必要的列,即使视图有超过270万条记录并且所有数据都已合并,它会按ID过滤,并以毫秒为单位返回父项的过滤记录。但是,如果我尝试从gc_drive中选择count(*),则需要69秒,因为包含文件,文件夹,资源和其他列表的基表每个都有超过700万条记录。

我想加入gc_drive视图和fn_resource_permissions表,该表包含用户是否有权访问该资源的信息。

当我使用以下查询获取当前登录用户的权限时,记录几乎立即返回:

select*from 
fn_resource_permissions
WHERE ((fn_resource_permissions.permission_id IN (23,
                                                  24,
                                                  25,
                                                  37,
                                                  36)
AND (fn_resource_permissions.user_id = 2
     OR fn_resource_permissions.user_id = 1)))

--59 records returned

上面的查询返回了对用户拥有的资源的所有权限。我将使用这些记录来过滤gc_drive视图。

当我尝试在以下查询中使用这59条记录来过滤掉gc_drive视图时,记录也会立即返回:

SELECT gc_drive.id,
       "fn_resources"."owner_id" AS "fn_resources.owner_id",
       "gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
WHERE "fn_resources"."archived" = 0
AND   gc_drive.id IN(
    1234,
    1235,
    1236,
    ...
);

当我尝试将上述2个查询加在一起时,问题就出现了:

SELECT gc_drive.id,
       "fn_resources"."owner_id" AS "fn_resources.owner_id",
       "gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
JOIN "fn_resource_permissions" ON ("fn_resource_permissions"."resource_id" = fn_resources.id)
WHERE "fn_resources"."archived" = 0
AND ((fn_resource_permissions.permission_id IN (23,
                                                24,
                                                25,
                                                37,
                                                36)
          AND (fn_resource_permissions.user_id = 2
               OR fn_resource_permissions.user_id = 1)))
LIMIT 101;

当我查看Explain计划时,我可以看到PostgreSQL实现了整个gc_drive视图,然后才尝试按资源权限进行过滤。这使得查询在几分钟内运行,而不是以毫秒为单位。我也尝试将资源权限放在with子句中,它也是如此。我知道其中一个解决方案可能是将每个用户的内容分离到自己的架构中但是我想知道是否有更好的方法来加入gc_drive视图和其他表而没有PostgreSQL首先实现整个视图。有没有办法修改查询,以便PostgreSQL只能过滤整个记录集。

解释上述查询的分析计划:

    ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      QUERY PLAN                                                                                       │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=4845.56..57547586.51 rows=101 width=722) (actual time=201371.236..201371.236 rows=0 loops=1)                                                                             │
│   ->  Hash Join  (cost=4845.56..681971785.33 rows=1197 width=722) (actual time=201371.234..201371.234 rows=0 loops=1)                                                                 │
│         Hash Cond: ("*SELECT* 1".resource_id = fn_resources.id)                                                                                                                       │
│         ->  Append  (cost=36.10..681748786.17 rows=15867640 width=247) (actual time=17521.979..200235.975 rows=2737742 loops=1)                                                       │
│               ->  Subquery Scan on "*SELECT* 1"  (cost=36.10..533636.75 rows=11252 width=150) (actual time=0.004..0.004 rows=0 loops=1)                                               │
│                     ->  Hash Join  (cost=36.10..533524.23 rows=11252 width=150) (actual time=0.003..0.003 rows=0 loops=1)                                                             │
│                           Hash Cond: (fnr_child.type = fnrt.type)                                                                                                                     │
│                           ->  Nested Loop  (cost=0.00..66613.04 rows=1940 width=122) (actual time=0.003..0.003 rows=0 loops=1)                                                        │
│                                 ->  Nested Loop  (cost=0.00..49607.76 rows=1940 width=100) (actual time=0.002..0.002 rows=0 loops=1)                                                  │
│                                       ->  Nested Loop  (cost=0.00..32602.48 rows=1940 width=104) (actual time=0.002..0.002 rows=0 loops=1)                                            │
│                                             ->  Nested Loop  (cost=0.00..16315.94 rows=1940 width=12) (actual time=0.002..0.002 rows=0 loops=1)                                       │
│                                                   ->  Seq Scan on gc_virtual_file gvf  (cost=0.00..29.40 rows=1940 width=8) (actual time=0.001..0.001 rows=0 loops=1)                 │
│                                                   ->  Index Scan using gc_file_resource_id_key on gc_file gf_child  (cost=0.00..8.38 rows=1 width=4) (never executed)                 │
│                                                         Index Cond: (resource_id = gvf.resource_id)                                                                                   │
│                                             ->  Index Scan using gc_file_resource_id_key on gc_file gf_parent  (cost=0.00..8.38 rows=1 width=92) (never executed)                     │
│                                                   Index Cond: (resource_id = gvf.parent_id)                                                                                           │
│                                       ->  Index Scan using fn_resources_pkey on fn_resources fnr_parent  (cost=0.00..8.75 rows=1 width=4) (never executed)                            │
│                                             Index Cond: (id = gvf.parent_id)                                                                                                          │
│                                 ->  Index Scan using fn_resources_pkey on fn_resources fnr_child  (cost=0.00..8.75 rows=1 width=30) (never executed)                                  │
│                                       Index Cond: (id = gvf.resource_id)                                                                                                              │
│                           ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (never executed)                                                                                           │
│                                 ->  Seq Scan on fn_resource_types fnrt  (cost=0.00..21.60 rows=1160 width=36) (never executed)                                                        │
│                           SubPlan 6                                                                                                                                                   │
│                             ->  Index Scan using users_pkey on fn_users  (cost=0.00..8.28 rows=1 width=13) (never executed)                                                           │
│                                   Index Cond: (id = fnr_child.owner_id)                                                                                                               │
│                           SubPlan 7                                                                                                                                                   │
│                             ->  Index Scan using gc_tables_resource_id_key on gc_tables gt  (cost=0.00..8.28 rows=1 width=18) (never executed)                                        │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 8                                                                                                                                                   │
│                             ->  Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms  (cost=0.00..8.27 rows=1 width=418) (never executed)                     │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 9                                                                                                                                                   │
│                             ->  Index Scan using gc_tables_resource_id_key on gc_tables gt  (cost=0.00..8.31 rows=1 width=14) (never executed)                                        │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 10                                                                                                                                                  │
│                             ->  Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms  (cost=0.00..8.30 rows=1 width=267) (never executed)                     │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│               ->  Subquery Scan on "*SELECT* 2"  (cost=580092.62..681215149.42 rows=15856388 width=247) (actual time=17521.724..199482.039 rows=2737742 loops=1)                      │
│                     ->  Hash Join  (cost=580092.62..681056585.54 rows=15856388 width=247) (actual time=17521.668..198482.521 rows=2737742 loops=1)                                    │
│                           Hash Cond: (fnr_child.type = fnrt.type)                                                                                                                     │
│                           ->  Hash Join  (cost=580056.53..1189954.39 rows=2733860 width=219) (actual time=16768.431..88474.618 rows=2737742 loops=1)                                  │
│                                 Hash Cond: (gf_parent.resource_id = fnr_parent.id)                                                                                                    │
│                                 ->  Hash Join  (cost=388738.70..943959.36 rows=2733860 width=219) (actual time=13619.604..72625.523 rows=2737742 loops=1)                             │
│                                       Hash Cond: (gf_child.path = gf_parent.pathname)                                                                                                 │
│                                       ->  Hash Join  (cost=202378.85..538142.64 rows=2733860 width=168) (actual time=6429.263..53255.517 rows=2737743 loops=1)                        │
│                                             Hash Cond: (fnr_child.id = gf_child.resource_id)                                                                                          │
│                                             ->  Seq Scan on fn_resources fnr_child  (cost=0.00..114415.70 rows=6152170 width=30) (actual time=21.243..19364.289 rows=6119914 loops=1) │
│                                             ->  Hash  (cost=112139.60..112139.60 rows=2733860 width=142) (actual time=5666.626..5666.626 rows=2737743 loops=1)                        │
│                                                   Buckets: 262144  Batches: 2  Memory Usage: 240441kB                                                                                 │
│                                                   ->  Seq Scan on gc_file gf_child  (cost=0.00..112139.60 rows=2733860 width=142) (actual time=0.015..3442.788 rows=2737751 loops=1)  │
│                                       ->  Hash  (cost=112139.60..112139.60 rows=2733860 width=92) (actual time=7173.094..7173.094 rows=2737751 loops=1)                               │
│                                             Buckets: 262144  Batches: 2  Memory Usage: 169423kB                                                                                       │
│                                             ->  Seq Scan on gc_file gf_parent  (cost=0.00..112139.60 rows=2733860 width=92) (actual time=0.007..5057.412 rows=2737751 loops=1)        │
│                                 ->  Hash  (cost=114415.70..114415.70 rows=6152170 width=4) (actual time=3145.424..3145.424 rows=6119914 loops=1)                                      │
│                                       Buckets: 1048576  Batches: 1  Memory Usage: 215154kB                                                                                            │
│                                       ->  Seq Scan on fn_resources fnr_parent  (cost=0.00..114415.70 rows=6152170 width=4) (actual time=0.008..1186.819 rows=6119914 loops=1)         │
│                           ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (actual time=0.021..0.021 rows=21 loops=1)                                                                 │
│                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB                                                                                                          │
│                                 ->  Seq Scan on fn_resource_types fnrt  (cost=0.00..21.60 rows=1160 width=36) (actual time=0.005..0.012 rows=21 loops=1)                              │
│                           SubPlan 1                                                                                                                                                   │
│                             ->  Index Scan using users_pkey on fn_users  (cost=0.00..8.28 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=2737742)                            │
│                                   Index Cond: (id = fnr_child.owner_id)                                                                                                               │
│                           SubPlan 2                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.62 rows=1 width=47) (never executed)                                            │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 3                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.62 rows=1 width=47) (actual time=0.025..0.025 rows=1 loops=2665337)             │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 4                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.63 rows=1 width=129) (never executed)                                           │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 5                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.66 rows=1 width=145) (actual time=0.003..0.003 rows=1 loops=2665337)            │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│         ->  Hash  (cost=4804.79..4804.79 rows=374 width=12) (actual time=412.434..412.434 rows=59 loops=1)                                                                            │
│               Buckets: 1024  Batches: 1  Memory Usage: 3kB                                                                                                                            │
│               ->  Nested Loop  (cost=13.32..4804.79 rows=374 width=12) (actual time=156.510..412.313 rows=59 loops=1)                                                                 │
│                     ->  Bitmap Heap Scan on fn_resource_permissions  (cost=13.32..1454.38 rows=374 width=4) (actual time=155.407..402.734 rows=59 loops=1)                            │
│                           Recheck Cond: ((user_id = 2) OR (user_id = 1))                                                                                                              │
│                           Filter: (permission_id = ANY ('{23,24,25,37,36}'::integer[]))                                                                                               │
│                           ->  BitmapOr  (cost=13.32..13.32 rows=391 width=0) (actual time=88.402..88.402 rows=0 loops=1)                                                              │
│                                 ->  Bitmap Index Scan on resource_permissions_user  (cost=0.00..6.57 rows=196 width=0) (actual time=88.380..88.380 rows=61 loops=1)                   │
│                                       Index Cond: (user_id = 2)                                                                                                                       │
│                                 ->  Bitmap Index Scan on resource_permissions_user  (cost=0.00..6.57 rows=196 width=0) (actual time=0.016..0.016 rows=0 loops=1)                      │
│                                       Index Cond: (user_id = 1)                                                                                                                       │
│                     ->  Index Scan using fn_resources_pkey on fn_resources  (cost=0.00..8.95 rows=1 width=8) (actual time=0.154..0.155 rows=1 loops=59)                               │
│                           Index Cond: (id = fn_resource_permissions.resource_id)                                                                                                      │
│                           Filter: (archived = 0)                                                                                                                                      │
│ Total runtime: 201373.236 ms                                                                                                                                                          │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

非常感谢。

1 个答案:

答案 0 :(得分:0)

您的查询的结构是fn_resourcesfn_resource_permissions过滤,然后加入gc_drive,尽管gc_drive中存在相同的数据。

这有点像预感,但请尽早使用gc_drive子句fn_resource_permissions过滤where,而不是加入联接。

select
    gc_drive.id,
    fn_resources.owner_id,
    gc_drive.*
from
    gc_drive
    join fn_resources on(
        gc_drive.resource_id = fn_resources.id
    )
where
    gc_drive.resource_id in(
        select
            fn_resource_permissions.resource_id
        from
            fn_resource_permissions
        where
            fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
            and fn_resource_permissions.user_id in(1, 2)
    )
    and fn_resources.archived = 0
limit 101;

Alternativley,尝试在加入

之前过滤fn_resources
select
    gc_drive.id,
    fn_resources.owner_id,
    gc_drive.*
from
    gc_drive
    join fn_resources on(
        gc_drive.resource_id = fn_resources.id
        and fn_resources.id in(
            select
                fn_resource_permissions.resource_id
            from
                fn_resource_permissions
            where
                fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
                and fn_resource_permissions.user_id in(1, 2)
        )
    )
where
    fn_resources.archived = 0
limit 101;
相关问题