将多行数据汇总到单行

时间:2016-02-04 13:00:05

标签: sql sql-server tsql sql-server-2008-r2

在我的表中,每一行都有一些数据列Priority列(例如,时间戳或只是一个整数)。我想按ID对数据进行分组,然后在每个组中采用最新的非空列。例如,我有以下表格:

id  A       B       C       Priority
1   NULL    3       4       1
1   5       6       NULL    2
1   8       NULL    NULL    3
2   634     346     359     1
2   34      NULL    734     2

期望的结果是:

id  A   B   C   
1   8   6   4   
2   34  346 734 

在这个例子中,表很小,只有5列,但在实际表格中,它会大得多。我真的希望这个脚本能够快速运行。我尝试自己做,但我的脚本适用于SQLSERVER2012 +所以我删除它不适用。

数字:表格可以有150k行,20列,20-80k的独特id,平均SELECT COUNT(id) FROM T GROUP BY ID2..5

现在我有一个正常工作的代码(感谢@ypercubeᵀᴹ),但它在大表上运行得非常慢,在我的情况下脚本可能需要一分钟甚至更长时间(带索引等)。

如何加快速度?

SELECT 
    d.id,
    d1.A,
    d2.B,
    d3.C
FROM 
    ( SELECT id
      FROM T
      GROUP BY id
    ) AS d
  OUTER APPLY
    ( SELECT TOP (1) A
      FROM T 
      WHERE id = d.id
        AND A IS NOT NULL
      ORDER BY priority DESC
    ) AS d1 
  OUTER APPLY
    ( SELECT TOP (1) B
      FROM T 
      WHERE id = d.id
        AND B IS NOT NULL
      ORDER BY priority DESC
    ) AS d2 
  OUTER APPLY
    ( SELECT TOP (1) C
      FROM T 
      WHERE id = d.id
        AND C IS NOT NULL
      ORDER BY priority DESC
    ) AS d3 ;

在具有实际数据量的测试数据库中,我遵循执行计划: enter image description here

4 个答案:

答案 0 :(得分:4)

这应该可以解决问题,所有提升到幂0的东西都将返回1,除了null:

class

结果:

"select-device-button"

答案 1 :(得分:2)

可能更快的一种替代方案是多连接方法。获取每列的优先级,然后返回原始表。第一部分:

select id,
       max(case when a is not null then priority end) as pa,
       max(case when b is not null then priority end) as pb,
       max(case when c is not null then priority end) as pc
from t
group by id;

然后再加入此表:

with pabc as (
      select id,
             max(case when a is not null then priority end) as pa,
             max(case when b is not null then priority end) as pb,
             max(case when c is not null then priority end) as pc
      from t
      group by id
     )
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
     t ta
     on pabc.id = ta.id and pabc.pa = ta.priority left join
     t tb
     on pabc.id = tb.id and pabc.pb = tb.priority left join
     t tc
     on pabc.id = tc.id and pabc.pc = tc.priority ;

这也可以利用t(id, priority)上的索引。

答案 2 :(得分:0)

以前的代码将使用以下语法:

 with pabc as (
          select id,
                 max(case when a is not null then priority end) as pa,
                 max(case when b is not null then priority end) as pb,
                 max(case when c is not null then priority end) as pc
          from t
          group by id
         )
    select pabc.Id,ta.a, tb.b, tc.c
    from pabc 
         left join t ta on pabc.id = ta.id and  pabc.pa = ta.priority 
         left join t tb on pabc.id = tb.id and pabc.pb = tb.priority 
         left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;

答案 3 :(得分:-1)

这看起来很奇怪。您有一个用于所有列更改的日志表,但没有与当前数据关联的表。现在,您正在寻找一个查询来从日志表中收集当前值,这自然是一项艰巨的任务。

解决方案很简单:有一个包含当前数据的附加表。您甚至可以使用触发器链接表(因此 每次在日志表中插入记录时,每次将更改写入当前表时都会更新当前表表写日志条目。)

然后只查询当前的表格:

select id, a, b, c from currenttable order by id;