SQL函数返回Group By中多列的“最常见值”

时间:2013-04-18 16:57:32

标签: sql sql-server

我希望找到最简单的方法来返回分组的select语句的多列结果中最常见的值。我在网上找到的所有内容都指向RANK中的一个项目,或者在GROUP BY之外单独处理每一列。

示例数据:

SELECT 100 as "auser", 
'A' as "instance1", 'M' as "instance2" 
union all select 100, 'B', 'M' 
union all select 100,'C', 'N' 
union all select 100, 'B', 'O'
union all select 200,'D', 'P' 
union all select 200, 'E', 'P' 
union all select 200,'F', 'P' 
union all select 200, 'F', 'Q'

示例数据结果:

auser   instance1   instance2
100     A           M
100     B           M
100     C           N
100     B           O
200     D           P
200     E           P
200     F           P
200     F           Q

查询逻辑(我在脑海中看到它):

SELECT auser, most_common(instance1), most_common(instance2)
FROM datasample
GROUP BY auser;

期望的结果:

100     B           M
200     F           P

3 个答案:

答案 0 :(得分:3)

解决此问题的方法使用嵌套窗口函数。最里面的子查询计算每列的计数。下一个子查询对这些进行排名(使用row_number())。外部查询然后使用条件聚合来获得所需的结果:

select auser, MAX(case when seqnum1 = 1 then instance1 end),
       MAX(case when seqnum2 = 1 then instance2 end)
from (select t.*,
             ROW_NUMBER() over (partition by auser order by cnt1 desc) as seqnum1,
             ROW_NUMBER() over (partition by auser order by cnt2 desc) as seqnum2
      from (select t.*,
                   count(*) over (partition by auser, instance1) as cnt1,
                   COUNT(*) over (partition by auser, instance2) as cnt2
            from t
           ) t
     ) t
group by auser   

答案 1 :(得分:1)

我不确定我是否能找到更优雅的东西,但如果您使用的是SQL 2005+(因为我使用的是ranking functionCTEs),这可能会有所帮助:< / p>

with instance1 as (
    select auser, instance1
        , row_number() over (partition by auser order by count(*) desc, instance1) as row_num
    from datasample
    group by auser, instance1
), instance2 as (
    select auser, instance2
        , row_number() over (partition by auser order by count(*) desc, instance2) as row_num
    from datasample
    group by auser, instance2
)
select a.auser, a.instance1, b.instance2
from instance1 as a 
    join instance2 as b on a.auser = b.auser
where a.row_num = 1
    and b.row_num = 1
order by a.auser;

我不确定你希望如何处理空值,并且将row_num等效值移动到连接条件不会改变我的框上的执行计划。

如果您使用的是SQL Server 2000,则可以使用派生表替换这些CTE,并通过使用count和"triangular join"伪造row_number()。

答案 2 :(得分:0)

就简单点

Select auser, instance1, instance2 FROM datasample GROUP BY auser,instance1, instance2 ;