从联接表中选择出现次数超过n的行

时间:2018-07-05 08:51:43

标签: postgresql join group-by count

我的问题与MySQL: Select rows with more than one occurrence类似,但我使用的是PostgreSQL。我有一个查询,例如:

select d.user_id, d.recorded_at, d.glucose_value, d.unit
from diary as d
join (
    select u.id
    from health_user as u
    join (
        select distinct user_id
        from care_connect
        where clinic_id = 217
            and role = 'user'
            and status = 'active'
    ) as c
    on u.id = c.user_id
    where u.is_tester is false
) as cu
on d.user_id = cu.id
where d.created_at >= d.recorded_at
    and d.recorded_at < current_date and d.recorded_at >= current_date - interval '30 days'
    and d.glucose_value > 0
    and (d.state = 'wakeup' or (d.state = 'before_meal' and d.meal_type = 'breakfast'))

结果如下:

+---------+---------------------+---------------+--------+
| user_id |     recorded_at     | glucose_value |  unit  |
+---------+---------------------+---------------+--------+
|   12041 | 2018-06-26 01:10:12 |           100 | mg/dL  |
|   12041 | 2018-06-30 02:10:11 |            90 | mg/dL  |
|   12214 | 2018-06-25 12:40:13 |            10 | mmol/L |
|   12214 | 2018-06-26 12:41:13 |            12 | mmol/L |
|   12214 | 2018-06-29 00:21:14 |            11 | mmol/L |
|   12214 | 2018-06-29 12:59:32 |            10 | mmol/L |
+---------+---------------------+---------------+--------+

如您所见,在许多情况下,这已经是一个漫长的查询。现在,我只想获取来自结果中不少于四个记录(行)的用户的记录,所以我尝试了:

select d.user_id, d.recorded_at, d.glucose_value, d.unit, count(d.*)
from diary as d
join (
    select u.id
    from health_user as u
    join (
        select distinct user_id
        from care_connect
        where clinic_id = 217
            and role = 'user'
            and status = 'active'
    ) as c
    on u.id = c.user_id
    where u.is_tester is false
) as cu
on d.user_id = cu.id
where d.created_at >= d.recorded_at
    and d.recorded_at < current_date and d.recorded_at >= current_date - interval '30 days'
    and d.glucose_value > 0
    and (d.state = 'wakeup' or (d.state = 'before_meal' and d.meal_type = 'breakfast'))
group by d.user_id
having count(d.*) >= 4

我的预期输出是:

+---------+---------------------+---------------+--------+
| user_id |     recorded_at     | glucose_value |  unit  |
+---------+---------------------+---------------+--------+
|   12214 | 2018-06-25 12:40:13 |            10 | mmol/L |
|   12214 | 2018-06-26 12:41:13 |            12 | mmol/L |
|   12214 | 2018-06-29 00:21:14 |            11 | mmol/L |
|   12214 | 2018-06-29 12:59:32 |            10 | mmol/L |
+---------+---------------------+---------------+--------+

但是,它抛出一个错误,说d.recorded_at也应该添加到group by中,但这不是我想要的。除了将原始时间戳分组之外,没有任何意义。

我知道我可能可以联接另一个表,该表是由同一查询生成的,但第一行只有select d.user_id, count(d.*),但是整个查询看起来会很疯狂。

请有人帮我如何更好地实现这一目标?抱歉,我没有在这里放置表结构,但是如果需要,我可以进行编辑和澄清。

2 个答案:

答案 0 :(得分:0)

尝试一下

Select user_id, recorded_at, glucose_value, unit
From (
select d.user_id, d.recorded_at, d.glucose_value, d.unit, count(1) over (partition by d.user_id) rcnt
from diary as d
join (
    select u.id
    from health_user as u
    join (
        select distinct user_id
        from care_connect
        where clinic_id = 217
            and role = 'user'
            and status = 'active'
    ) as c
    on u.id = c.user_id
    where u.is_tester is false
) as cu
on d.user_id = cu.id
where d.created_at >= d.recorded_at
    and d.recorded_at < current_date and d.recorded_at >= current_date - interval '30 days'
    and d.glucose_value > 0
    and (d.state = 'wakeup' or (d.state = 'before_meal' and d.meal_type = 'breakfast'))
) x 
Where rcnt >= 4

答案 1 :(得分:0)

尝试一下:

将your_query替换为您的实际查询。

使用 with子句 exists子句

with original_query as ( your_query )
select * from original_query q1
where 
exists( select q2.user_id from original_query q2 where q1.user_id = q2.user_id
group by q2.user_id 
having count(q2.user_id) >= 4 )