有效地聚合JSONB数组

时间:2015-09-24 01:18:31

标签: postgresql

我有一个存储JSONB字段(data)的表,其中包含类似Facebook的数据。数据结构是:

-
id         | 9403
kind       | 'likes'
data       | [{ id: "1", name: "Pluto", category: "Planet"}, { id: "2", name: "Saturn", category: "Planet" }]
-
id         | 9403
kind       | 'likes'
data       | [{ id: "2", name: "Neptune", category: "Planet"}, { id: "3", name: "Mars", category: "Planet" }]

目标是编写查询,按类别聚合每个类别的前N(5)个喜欢。我有以下子查询,我不确定如何优化(使用索引或重写)。目标是对名称和类别进行分组,以便对其进行排名。我从有效选择最受欢迎的N的简单问题开始:

SELECT
likes.entry->>'name' AS name,
likes.entry->>'category' AS category, 
COUNT(*) AS count
FROM (SELECT json_array_elements(metadata.data::JSON) AS entry FROM metadata WHERE metadata.kind = 'likes') AS likes
GROUP BY name, category
ORDER BY count DESC
LIMIT 5

该查询已经需要5秒钟才能运行(粘贴说明/分析):

Limit  (cost=39971.07..39971.07 rows=5 width=32) (actual time=5468.952..5468.954 rows=5 loops=1)
  ->  Sort  (cost=39971.07..39971.17 rows=200 width=32) (actual time=5468.952..5468.954 rows=5 loops=1)
        Sort Key: (count(*))
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=39969.61..39970.41 rows=200 width=32) (actual time=5241.143..5376.502 rows=392515 loops=1)
              Group Key: (likes.entry ->> 'name'::text), (likes.entry ->> 'category'::text)
              ->  Subquery Scan on likes  (cost=0.00..34491.46 rows=3652100 width=32) (actual time=0.104..4552.531 rows=880073 loops=1)
                    ->  Seq Scan on metadata  (cost=0.00..19883.06 rows=3652100 width=703) (actual time=0.097..2146.678 rows=880073 loops=1)
                          Filter: ((kind)::text = 'likes'::text)
                          Rows Removed by Filter: 90145

我可以以某种方式重构这个更快/添加一些索引而不使用物化视图吗?我尝试添加以下(无用)索引:

CREATE INDEX index_metadata_on_likes_raw ON metadata USING gin(data) WHERE (kind = 'likes');
CREATE INDEX index_metadata_on_likes_targeted ON metadata ((data ->> 'name'), (data ->> 'category')) WHERE (kind = 'likes');

1 个答案:

答案 0 :(得分:0)

祝你尝试:

select name, category, COUNT(*) AS count from(
   SELECT jsonb_array_elements(test.data::JSONB)->>'name' as name, jsonb_array_elements(test.data::JSONB)->>'category' as category FROM test WHERE test.kind = 'likes') a 
GROUP BY name, category
ORDER BY count DESC
LIMIT 5;