SQL计算重复的电子邮件

时间:2014-03-29 21:22:22

标签: sql sql-server sql-server-2008-r2

我遇到了编写查询的问题,该查询可以在特定条件下查找唯一计数/重复项。我试图从一个类似于这个的表中一次得到计数:

|-P_key-|-----email-----|-act_no-|--Client--|
|   1   | joe@code.com  |    1   |   Jets   |
|   2   | bob@code.com  |    2   |   Jets   |
|   3   | sue@code.com  |  NULL  |   Jets   |
|   4   | joe@code.com  |    1   |   Bills  |
|   5   | bob@code.com  |    2   |   Bills  |
|   6   | bob@code.com  |    2   |   Giants |
|   7   | max@code.com  |    2   |   Giants |
|   8   | ben@code.com  |    5   |   Pats   |

我正在寻找的客户计数如下:

  1. 每个客户的总记录数
  2. 跨客户的唯一电子邮件总数
  3. 客户
  4. 中的唯一帐户总数
  5. 客户之间的唯一帐户总数
  6. 客户
  7. 中空白帐号的计数

    我知道我可以使用一个小组,并且为了像这样单独获得这些计数:

    SELECT COUNT(email)
    FROM Table
    GROUP BY EMAIL
    HAVING COUNT(email) > 1;
    

    但我希望创建一个可以同时返回所有内容的代码。我正在使用SQL Server 2008。

    我希望实现的输出结果如下(尽管最终数据本身必须以此为基础进行调整):

     |                                  |  Jets  |  Bills | Giants |  Pats |
     | Total emails                     |   3    |    2   |    2   |   1   |
     | unique emails across projects    |   5    |    5   |    3   |   0   |
     | unique account_no across projects|   6    |    6   |    4   |   0   |
     | unique account_no within project |   0    |    0   |    2   |   0   |
     | blank account_no within project  |   1    |    0   |    0   |   0   |
    
     OR
    
     |        |  tot unique emails |  duped account_no's | etc...
     | Jets   |   3                |    5                |   
     |Bills   |   2                |    5                |   
     | Giants |   2                |    3                |    
     | Pats   |   1                |    0                |   
    

    提前感谢您提供任何帮助!

2 个答案:

答案 0 :(得分:2)

首先,您无法获得您提到的结构中的格式。您可以通过一行和五列来获取每个客户端。

其次,你有非常奇怪的标准。如果在多个客户端上显示电子邮件,则每个客户端的欺骗计数包含所有电子邮件的总数。好的,但您需要计算电子邮件发生的次数确定它是否出现在多个客户端上。

解决方案是使用窗口函数计算一堆中间结果。例如,min()max()窗口函数用于确定电子邮件或帐号是否出现在多个帐户中。

没有SQL小提琴来测试一个,这是我最好的尝试:

select client,
       count(email) as NumEmails,
       sum(case when email_minclient <> email_maxclieint then email_cnt else 0
           end) as NumEmailsDuped,
       sum(case when actno_minclient <> actno_maxclieint then actno_cnt else 0
           end) as NumActnoDuped,
       sum(case when clientactno_cnt > 1 then clientactno_cnt else 0
           end) as NumActnoDupedWithin,
       sum(case when ActNo is null then 1 else 0 end) as NumActnoNull
from (select t.*,
             count(*) over (partition by email) as email_cnt,
             count(*) over (partition by act_no) as actno_cnt,
             count(*) over (partition by client, act_no) as clientactno_cnt,
             min(client) over (partition by email) as email_minclient,
             max(client) over (partition by email) as email_maxclient,
             min(client) over (partition by act_no) as email_minactno,
             max(client) over (partition by act_no) as email_maxactno
      from table t
     ) t
group by client;

答案 1 :(得分:0)

这应该会给你想要的结果:

select client,
       count(email) as "Total emails",
       sum(case when email_minclient <> email_maxclient then email_cnt else 0
           end) as "unique emails across projects",
       sum(case when email_minclient <> email_maxclient then actno_cnt else 0
           end) as "unique account_no across projects",
       sum(case when clientactno_cnt > 1 then 1 else 0
           end) as "unique account_no within project",
       sum(case when act_no is null then 1 else 0 end) as "blank account_no within project "
from (select t.*,
             count(*) over (partition by email) as email_cnt,
             count(*) over (partition by act_no) as actno_cnt,
             count(*) over (partition by client, act_no) as clientactno_cnt,
             min(client) over (partition by email) as email_minclient,
             max(client) over (partition by email) as email_maxclient
      from table t  
     ) t
group by client

向Gordon Linoff致信

相关问题