SUM BY与GROUP BY

时间:2017-10-30 18:15:55

标签: sql ms-access group-by aggregate window-functions

我正在处理一个包含数百万行的大型数据库,我正努力提高查询效率。该数据库包含贷款组合的定期快照,其中有时贷款违约(状态从'1'变为<>'1')。当它们执行时,它们仅在相应的快照中出现一次,然后不再报告它们。我试图得到这些贷款的累积计数 - 随着时间的推移而发展,并根据原产国,年份等分为许多桶。 SUM(...)OVER似乎是一个非常有效的函数来实现结果但是当我运行以下查询时

Select 
assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status, 
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType, 
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal, 
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(aa27) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(aa26) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22>='2014-01' and aa22<='2014-12' and vintage='2015' and active=0 and aa74<>'1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate

我得到了

  

SQL错误(8120)列aa27在所选列表中无效,因为它未包含在聚合函数或GROUP BY子句中

任何人都能解开一些光明吗?感谢

2 个答案:

答案 0 :(得分:0)

我相信你想要:

Select assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status, 
       AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType, 
       count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal, 
       SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
       SUM(SUM(aa27)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
       SUM(SUM(aa26)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22 >= '2014-01' and aa22 <= '2014-12' and vintage = '2015' and
      active = 0 and aa74 <> '1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate;

请注意累积和表达式中的SUM(SUM())

答案 1 :(得分:0)

这是我发现的工作,将我的结果与一些外部研究数据进行比较。 为简化了可读性,我简化了字段:

    select 
      poolcutoffdate, 
      count(1) as LoanCount,
      MAX(sum(case status when 'default' then 1 else 0 end)) 
      over (order by poolcutoffdate 
            ROWS between unbounded preceding AND CURRENT ROW) as CumulDefaults

from myDatabase
group by poolcutoffdate
order by poolcutoffdate asc

因此,我计算从开始到当前截止日期至少一次处于“默认”状态的所有贷款。

注意使用MAX(SUM()),以便结果是从第一行到当前行的各种迭代中的最大值。使用SUM(SUM())将添加导致累积累积的各种迭代。

我考虑将SUM(SUM())与“PARTITION BY poolcutoffdate”一起使用,以便计数从0重新开始,并且不会从上一个截止日期添加,但这只包括最近截止的贷款,所以如果贷款违约并且从池中删除它将被错误地计算在内。

注意OVER语句中的CASE。

感谢所有帮助