按值组分组

时间:2015-10-14 17:09:30

标签: sql postgresql

我在PostgreSQL中处理一个庞大的数据库。 (对不起,如果没有正确编辑,我已经尝试了几个小时并且仍然在努力)

这是用于我的查询的表格结构的一部分:( table user_activities)包含一些示例数据。

+---------------------+---------------------+---------------------+
| user_id             | activity            | operation           |
+---------------------+---------------------+---------------------+
| 1                   | 1                   | 1                   |
| 1                   | 1                   | 1                   |
| 1                   | 1                   | 1                   |
| 2                   | 1                   | 2                   |
| 2                   | 1                   | 3                   |
| 3                   | 1                   | 3                   |
| 4                   | 1                   | 4                   |
| 4                   | 1                   | 4                   |
| 5                   | 1                   | 4                   |
| 5                   | 1                   | 5                   |
| 6                   | 3                   | 1                   |
| 6                   | 3                   | 1                   |
| 6                   | 3                   | 2                   |
| 7                   | 3                   | 3                   |
| 8                   | 3                   | 4                   |
| 8                   | 3                   | 5                   |
+---------------------+---------------------+---------------------+

这是我想要的输出:

+---------------------+---------------------+---------------------+
| count(user_id)      | activity            | operation           |
+---------------------+---------------------+---------------------+
| 4                   | 1                   | 1,2                 |
| 6                   | 1                   | 3,4,5               |
| 6                   | 3                   | 1,2,3,4,5           |
+---------------------+---------------------+---------------------+

我需要为每个活动和一组操作值计算user_id。因此,当活动为1或3时,我需要按活动进行分组。(已经完成了WHERE activity IN (1,3))。但我也需要按操作分组。问题是每组操作都有超过1个值。操作可以是1,2,3,4和5.我想连接1,2和3,4,5的组。但那不是全部......

如果我按操作分组,那么每个活动我都会有5个小组。我需要为活动1(已指定的组)设置2个组,如果活动为3,则只需要一个具有所有操作值的组。

这可能吗?

修改 我现在无法检查答案,我希望明天能够。那么请给出我的投票和答复,谢谢你的帮助。

3 个答案:

答案 0 :(得分:2)

根据您的详细规范进行了更新:

SELECT COUNT(*) as cnt, ua.activity, array_agg(distinct ua.operation)
FROM users ua
JOIN (
  SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE
) c
ON ua.activity = c.activity and ua.operation = c.operation
GROUP BY c.GROUP_CODE, ua.activity

http://sqlfiddle.com/#!15/46e1f/15

原始回答

我就是这样做的,下面我动态创建逻辑表,但你也可以在数据库中拥有该表并加入它。

SELECT GROUP_CODE, COUNT(*) as cnt
FROM user_activities ua
JOIN (
  SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE
    UNION ALL
  SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE
) c
ON ua.activity = c.activity and ua.operation = c.operation
GROUP BY GROUP_CODE

这应该非常快 - 记住SQL设计用于集合(表)和连接 - 这使用连接来执行逻辑。这也很好,因为如果你把它变成一个表,你可以通过改变表来改变逻辑或者有多个"逻辑"如果您添加另一列以选择打开,则存储在表中,然后选择在查询运行时使用哪一列。

我已经使用类似的方法在动态用户界面中进行加权和个性化排序。

答案 1 :(得分:2)

根据我的理解,这样的查询可以帮到你。问题和评论中的信息让我感到困惑,所以我用我最好的判断来提供解决方案

create table test (user_id int, activity int, operation int);
insert into test values (1,1,1), (1,1,1), (1,1,2), (2,1,3), (2,1,4), (3,3,1), (4,3,3), (4,3,5);

select count(*), activity, array_agg(operation)
from test
group by activity, user_id

Result:
| count | activity | array_agg | 
| 3     | 1        | {1,1,2}   | 
| 2     | 1        | {3,4}     | 
| 1     | 3        | {1}       | 
| 2     | 3        | {3,5}     | 

根据编辑过的问题,我觉得这就是我解决问题的方法:

表:

create table test (user_id int, activity int, operation int);
insert into test values 
(1,1,1),(1,1,1),(1,1,1),
(2,1,2),(2,1,3),
(3,1,3),
(4,1,4),(4,1,4),
(5,1,4),(5,1,5),
(6,3,1),(6,3,1),(6,3,2),
(7,3,3),
(8,3,4),(8,3,5);

查询:

select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where operation in (1,2) and activity = 1
group by activity

UNION ALL

select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where operation in (3,4,5) and activity = 1
group by activity

UNION ALL

select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where activity = 3
group by activity

结果

count | activity | string_agg
4     | 1        | 1,2
6     | 1        | 3,4,5
6     | 3        | 1,2,3,4,5

答案 2 :(得分:1)

SQL Fiddle Demo

只需使用CASE将您想要的群组放在一起。

WITH cte as (
    SELECT "user_id", "activity", "operation",
        CASE 
             WHEN "activity" = 1 THEN 
                   CASE 
                       WHEN "operation" IN (1,2) THEN '1_first'        
                       ELSE '1_second'
                   END 
             WHEN "activity" = 3 THEN '3_first'
        END as "op_group"
    FROM user_activities
)
SELECT "activity", 
       "op_group", 
        count("user_id"), 
        array_agg(distinct "operation") as "operation"
FROM cte
GROUP BY "activity", "op_group"

输出

| activity | op_group | count | operation |
|----------|----------|-------|-----------|
|        1 |  1_first |     4 |       1,2 |
|        1 | 1_second |     6 |     3,4,5 |
|        3 |  3_first |     6 | 1,2,3,4,5 |
相关问题