计算SQL中每个唯一列组合的行数

时间:2015-04-24 03:10:41

标签: sql postgresql aggregate-functions greatest-n-per-group

我想基于两列返回一组表中的唯一记录以及最近的发布时间以及这两列的组合在之前(及时)出现的次数的总计数记录他们的产出。

所以我想要得到的就是这些:

select col1, col2, max_posted, count from T
join (
 select col1, col2, max(posted) as posted  from T where groupid = "XXX" 
group by col1, col2) h
on ( T.col1 = h.col1 and
  T.col2 = h.col2 and
  T.max_posted = h.tposted)
where T.groupid = 'XXX'

计数需要是输出中每条记录的max_posted之前发生col1和col2的每个组合的次数。 (我希望我能正确解释:)

编辑:尝试以下建议:

 select dx.*,
   count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as   cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted  from dx where groupid ="XXX" 
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
  dx.cdesc = h.cdesc and
  dx.tposted = h.tposted)
where groupid =  'XXX';

计数总是返回' 1'。另外,您如何仅计算tposted之前发生的记录?

这也失败了,但我希望你能够到达我所处的位置:

  WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT  count(*) as cnt
    FROM dx, h
    WHERE dx.cicd9 = h.cicd9
      and dx.cdesc = h.cdesc
      and dx.tposted <= h.tposted
      and dx.groupid = 'XXX'
 )
SELECT H.*,J.cnt
FROM H,J 

帮助任何人?

4 个答案:

答案 0 :(得分:1)

这个怎么样:

case('getState'):
   if(isset($_GET['topic']){
       $filename = "logs/".$_GET['topic'].".txt";
        if(file_exists($filename)){
            $lines = file($filename);
        }
    }
}

由于缺乏PG版本,表格定义,数据和所需的输出,这只是从臀部拍摄,但原则应该有效:在SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc, max(posted) OVER w AS last_post, count(*) OVER w AS num_posts FROM dx WHERE groupid = 'XXX' WINDOW w AS ( PARTITION BY cicd9, cdesc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ); 的两列上创建一个分区,然后找到最大值groupid = 'XXX'列的列表和窗口框架中的总行数(因此窗口定义中的posted子句)。

答案 1 :(得分:0)

你只想要累积计数吗?

select t.*,
       count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';

答案 2 :(得分:0)

这是我能想到的最好的 - 欢迎更好的建议!

这将产生我需要的结果,并且理解计数总是至少为1(来自连接):

  SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid   =  'XXX' 
    group by cicd9, cdesc) h
on 
  (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;

 WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc)  
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;

答案 3 :(得分:0)

这令人困惑:

  

计数需要是每个组合col1和的次数   col2发生在输出中每条记录的max_posted之前。

根据定义,每个记录都是&#34;之前&#34; (或同时)最新的帖子,这实际上意味着每个组合的总计数(忽略句子中假设的一个一个错误)。

所以这归结为一个简单的GROUP BY

SELECT cicd9, cdesc
     , max(posted) AS last_posted
     , count(*)    AS ct
FROM   dx
WHERE  groupid = 'XXX'
GROUP  BY 1, 2
ORDER  BY 1, 2;

与当前接受的答案完全相同 。只是更快更简单。