考虑组中的每个id的单个记录

时间:2016-09-20 05:33:57

标签: sql group-by

背景

我有一个包含4列的SQL表:

  • id - varchar(50)
  • g1 - varchar(50)
  • g2 - varchar(50)
  • datetime - 时间戳

我有这个问题:

SELECT g1,
       COUNT(DISTINCT id),
       SUM(COUNT(DISTINCT id)) OVER () AS total,
       (CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc

此查询旨在回答:用户中g1值的分布是什么?

问题

每个id可能在表格中包含多个记录。我想考虑最早的一个。早期意味着最小datetime值。

实施例

id    g1    g2      datetime
x1    a     start   2016-01-19 21:01:22
x1    c     start   2016-01-19 21:01:21
x2    b     start   2016-01-19 09:03:42
x1    a     start   2016-01-18 13:56:45

实际查询结果

g1  count   total   share
a   2       4       0.5
b   1       4       0.25
c   1       4       0.25

我们有4条记录,但我只想考虑两条记录:

x2    b     start   2016-01-19 09:03:42
x1    a     start   2016-01-18 13:56:45

这是每id个最早的记录。

预期的查询结果

g1  count   total   share
a   1       2       0.5
b   1       2       0.5

问题

如何仅考虑id

中每个group by的最早记录

4 个答案:

答案 0 :(得分:2)

我不知道你的DBMS是什么,所以这里采用标准的ANSI方式

SELECT T1.g1,
       COUNT(DISTINCT id),
       SUM(COUNT(DISTINCT id)) OVER () AS total,
       (CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table T1
INNER JOIN 
    (SELECT id, MIN(datetime) AS mindt
     FROM mytable 
     GROUP BY id
     ) T2 ON T1.datetime=t2.mindt AND T1.id=T2.id
and T1.g2 = 'start'
GROUP BY 1
order by share desc

如果您有一个大表并且datetime未编入索引,则可能会很慢。

答案 1 :(得分:2)

这是一个应该在SQL Server中工作的解决方案,以及任何支持CTE的数据库:

WITH cte AS
(
    SELECT t1.g1,
           COUNT(*) AS count
    FROM yourTable t1
    INNER JOIN
    (
        SELECT id, MIN(datetime) AS datetime
        FROM yourTable
        GROUP BY id
    ) t2
        ON t1.id = t2.id AND
           t1.datetime = t2.datetime
)

SELECT t.g1,
       t.count,
       (SELECT COUNT(*) FROM cte) AS total,
       t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t

答案 2 :(得分:2)

尝试使用以下查询。

;WITH cte_1
   as (SELECT id, MIN(datetime) AS [Date]
     FROM YourTable 
     GROUP BY id
     ) 
     SELECT yt.g1,
            COUNT(DISTINCT yt.id) [Count],
            SUM(COUNT(DISTINCT yt.id)) OVER () AS total,
            (CAST(COUNT(DISTINCT yt.id) AS float) / SUM(COUNT(DISTINCT yt.id)) OVER ()) AS share
     FROM cte_1 c
       JOIN YourTable  yt
      ON yt.[datetime]=c.[Date] AND yt.id=c.id
and yt.g2 = 'start'
GROUP BY yt.g1
ORDER BY share DESC

输出:

enter image description here

答案 3 :(得分:1)

您正在查询my_table所有数据,尽管您只想获得id的最早日期。我假设id是表中的主键。

我建议您定义一个视图(或内嵌视图),该视图仅查询id的最早日期,并在该视图上使用您的查询,而不是 my_table 。< / p>

视图可以这样定义,并且只包含id的最早日期:

select * from my_table a 
where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start'

您可以将其定义为视图或直接使用它,如下所示:

SELECT g1,
       COUNT(DISTINCT id),
       SUM(COUNT(DISTINCT id)) OVER () AS total,
       (CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM (select a.id, a.g1, a.g2, a.datetime from my_table a where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start')
GROUP BY 1
order by share desc