MySql:按多个条件获取递增项目的计数

时间:2015-01-20 13:17:10

标签: mysql sql database

Here is the dummy data,这是一个电话记录数据表。

这是它的一瞥:

|  call_id  |   customer   |   company   |     call_start      | 
|-----------|--------------|-------------|---------------------|
|1411482360 | 001143792042 | 08444599175 | 2014-07-31 13:55:03 |
|1476992122 | 001143792042 | 08441713191 | 2014-07-31 14:05:10 |

customercompany字段代表他们的电话号码。

  • 要求根据以下逻辑计算总'增益'总'丢失'值:

修改

- 客户A致电公司A.
- 如果客户A致电B公司,那么B公司将获得+1收益,而A公司将失去+1。
  - 如果客户A致电公司C,则公司C将获得+1收益,而公司B将失去+1 - 如果客户A再次致电公司C,则溢出/收益不会受到影响 - 只有在客户A发出第二个电话后,才会发挥收益/损失。

  - 如果客户按此顺序呼叫公司:A,B,B,C,A,A,C,B,D,则流程应如下所示:

A ->  
B ->  B +1 gain,  A +1 lost
B ->  
C ->  C +1 gain,  B +1 lost
A ->  A +1 gain,  C +1 lost
A ->  
C ->  C +1 gain,  A +1 lost
B ->  B +1 gain,  C +1 lost
D ->  D +1 gain,  B +1 lost

在上述过程之后,我们应该将总值设为:

Company    Total gain    Total lost
  A            1             2            
  B            2             2       
  C            2             2         
  D            1             0     

我开始研究这个但是它错了,它只是一个想法,它不会根据上述条件给我单独增加的增益和丢失值:

DROP TABLE IF EXISTS GetTotalGainAndLost;

CREATE TEMPORARY TABLE IF NOT EXISTS GetTotalGainAndLost
    AS 
        (
        SELECT SUM(count) as 'TotalGainAndLost', `date`, DAY(`date`) as 'DAY' 
        FROM (SELECT count(*) as 'count', customer, `date` 
            FROM (SELECT customer, company, count(*) AS 'count', DATE_FORMAT(`call_end`,'%Y-%m-%d') as 'date' 
                FROM calls 
                WHERE `call_end` LIKE CONCAT(2014, '-', RIGHT(CAST(concat('0', 01) AS CHAR),2),'-%')
                GROUP BY customer, company, DAY(`call_end`) ORDER BY `call_end` ASC)
            as tbl1 group by customer, `date` having count(*) > 1) 
        as tbl2 GROUP by `date`
        );

Select * from GetTotalGainAndLost;

DROP TABLE GetTotalGainAndLost;

此查询未显示任何结果。

  • 所需的输出如下所示:

每个公司和日期应该是一行(总收益和每天丢失的电话,例如1月)

|  company    |  totalGain |  totalLost  |     date     |  DAY  | 
|-------------|------------|-------------|--------------|-------|
| 08444599175 |     17     |       6     | 2014-07-01   |  1    |
| 08444599175 |     12     |      10     | 2014-07-02   |  2    |
| 08444599175 |      3     |       6     | 2014-07-02   |  3    |
| 08444599175 |   ....     |      ...    |     ...      | ...   |
| 08444599175 |      7     |       6     | 2014-07-31   | 31    |

4 个答案:

答案 0 :(得分:5)

简化

N 表示为公司出现的次数。让我们尝试用三个简单的规则简化公式。

  1. 出现的第一家公司将获得N - 1收益,N亏损。
  2. 中间公司将有N收益,N亏。
  3. 最后一家公司将有N收益,N - 1损失

  4. 测试

    在你的例子中:

    • 从公司A开始,它出现了3次。
    • 公司B出现3次
    • 公司C出现2次
    • 以公司D结束,出现1次。

    结果

    Company      Gain           Lost  
      A            2             3            
      B            3             3       
      C            2             2         
      D            1             0    
    

    转换为SQL

    首先,我们首先计算每家公司的数量。

    SELECT
        company, COUNT(*) AS gain, COUNT(*) AS lost, DATE(call_start) AS date
    FROM calls 
    GROUP BY DATE(call_start), company
    

    然后,我们开始选择每个公司第一次出现的每个客户的号码。

    SELECT company, -COUNT(*) AS gain, 0 AS lost, DATE(call_start) AS `date`
    FROM calls INNER JOIN (
        SELECT MIN(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
    ) AS t ON (calls.call_id = t.call_id)
    GROUP BY DATE(call_start), calls.company
    

    最后出现的公司数量。

    SELECT company, 0 AS gain, -COUNT(*) AS lost, DATE(call_start) AS `date`
    FROM calls INNER JOIN (
        SELECT MAX (call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
    ) AS t ON (calls.call_id = t.call_id)
    GROUP BY DATE(call_start), calls.company
    

    结合SQL

    最后,我们可以使用UNION ALL将整个SQL组合在一起,然后按照。

    执行另一个组
    SELECT company, SUM(gain) AS gain, SUM(lost) AS lost, `date` FROM (
        (
            SELECT
                company, COUNT(*) AS gain, COUNT(*) AS lost, DATE(call_start) AS `date`
            FROM calls 
            GROUP BY DATE(call_start), company
        ) UNION ALL (
            SELECT company, -COUNT(*) AS gain, 0 AS lost, DATE(call_start) AS `date`
            FROM calls INNER JOIN (
                SELECT MIN(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
            ) AS t ON (calls.call_id = t.call_id)
            GROUP BY DATE(call_start), calls.company
        ) UNION ALL (
            SELECT company, 0 AS gain, -COUNT(*) AS lost, DATE(call_start) AS `date`
            FROM calls INNER JOIN (
                SELECT MAX(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
            ) AS t ON (calls.call_id = t.call_id)
            GROUP BY DATE(call_start), calls.company
        )
    ) AS t
    GROUP BY `date`, company
    

    澄清

    上述查询假设每个新的一天都是独立的。例如,

    • 客户A致电公司A(第1天)
    • 客户A致电公司B(第1天)B获得1,A丢失1
    • 客户A致电公司C(第1天)C获得1,B输掉1
    • 客户A致电公司D(第2天)
    • 客户A致电公司E(第2天)E获得1,D失去1

    结果将是

    COM   G     L   DAY
     ----------------
    A     0     1    1
    B     1     1    1
    C     1     0    1
    D     0     1    2
    E     1     0    2
    

答案 1 :(得分:3)

这应该有效 -

CTEGains 了解公司每个客户每个日期出现的次数。

CTEFirst 查明该公司当天是否是该客户的第一次联系。

CTELast 查明该公司是否是当天该客户的最后一次联系。

然后代码应遵循您指出的逻辑。

CREATE TEMPORARY TABLE CTEGains (RNo int, customer varchar(14), company varchar(16), startdate date, gains int)
CREATE TEMPORARY TABLE CTEFirst (customer varchar(14), call_start date, company varchar(16))
CREATE TEMPORARY TABLE CTELast (customer varchar(14), call_start date, company varchar(16))
Insert into CTEGains
Select ROW_NUMBER() over (partition by customer order by Customer) Rno, customer, company, Convert(date,call_start) startdate, count(company) gains 
from calls
group by customer, company, Convert(date,call_start), call_start

Insert into CTEFirst
Select customer, min(Convert(date,call_start)) call_start, min(company) company
from calls
group by customer, Convert(date,call_start)

Insert into CTELast
Select customer, max(Convert(date,call_start)) call_start, max(company) company
from #calls
group by customer, Convert(date,call_start)

Select c1.company, 
SUM(gains) - case when exists (Select * from CTEGains c2 where c2.customer = max(c1.customer) and max(c1.Rno) = c2.Rno - 1 and c1.company = c2.company and c1.startdate = c2.startdate) then 1 else 0 end --Didn't gain as same company called
           - case when exists (select * from CTEFirst c2 where c2.company = c1.company and c2.call_start = c1.startdate) then 1 else 0 end TotalGain -- Didn't gain as first company
, SUM(gains) - case when exists (Select * from CTEGains c2 where c2.customer = max(c1.customer) and max(c1.Rno) = c2.Rno - 1 and c1.company = c2.company and c1.startdate = c2.startdate) then 1 else 0 end --Didn't lose as same company as last called
             - case when exists (select * from CTELast c2 where c2.company = c1.company and c2.call_start = c1.startdate) then 1 else 0 end TotalLost -- didn't lose as last company
, startdate [date], DatePart(DAY, startdate) [Day]
from CTEGains c1
group by c1.company, c1.startdate

Drop Table CTEFirst
Drop Table CTEGains
Drop Table CTELast

答案 2 :(得分:3)

我认为最简单的方法是使用两个查询。首先,我们可以获得总收益,计算每个客户对不同公司的每次通话:

select g.company company, count(g.call_id) gain
from calls c
join calls g on c.customer = g.customer and c.company <> g.company and c.call_start < g.call_start
left join calls m on g.customer = m.customer and g.company <> m.company and g.call_start > m.call_start and m.call_start > c.call_start
where m.call_id is null
group by g.company;

如果客户向各个公司拨打各种电话,则需要左连接才能计算额外的收益(即,如果客户电话公司a,b和c公司c只有一个获得,而不是两个)。

采用相同方法的总损失:

select l.company company, count(l.call_id) lost
from calls c
join calls l on c.customer = l.customer and c.company <> l.company and c.call_start > l.call_start
left join calls m on l.customer = m.customer and l.company <> m.company and c.call_start > m.call_start and l.call_start < m.call_start
where m.call_id is null
group by l.company;

这里有一个小小的演示解决方案:http://sqlfiddle.com/#!2/3236ab/7

答案 3 :(得分:2)

让我们先做一些定义:

  • 非首次通话:任何不是第一次致电客户的电话。
  • 非最后一次通话:任何不是客户最后一次通话的电话。

我们已经介绍了 first last 的概念,这意味着我们需要在我们的调用集上定义一个总订单。我们可以遵循我们想要的任何规则,但出于解释的目的,我假设呼叫按开始时间排序,并且在相同的开始时间由id排序。换句话说:

  • 如果callA.sartTime < callB.startTime,则callA < callB
  • 如果callA.startTime = callB.startTimecallA.id = callB.id,则callA < callB

请注意我们如何使用以下查询获取集合的所有非首次调用:

SELECT *
FROM calls AS non_first_calls
    RIGHT JOIN calls
    ON non_first_calls.customer = calls.customer
    AND non_first_calls.call_start >= calls.call_start
    AND non_first_calls.call_id > calls.call_id
WHERE non_first_calls.call_id IS NOT NULL

(查询输出有重复,即调用可以出现多次)

同样,我们可以按如下方式获取所有非最后一次调用:

SELECT *
FROM calls AS non_last_calls
    RIGHT JOIN calls
    ON non_last_calls.customer = calls.customer
    AND non_last_calls.call_start <= calls.call_start
    AND non_last_calls.call_id < calls.call_id
WHERE non_last_calls.call_id IS NOT NULL

业务逻辑

每次客户拨打任何其他电话后,公司都会获得+1。这意味着,对于任何给定的公司,其收益等于它收到的非首次呼叫的数量。同样,公司的损失等于它收到的非最后一次通话的数量。

强大的查询

因此,对于每家公司,我们只需要计算它收到的非首次呼叫和非最后呼叫的数量。

每个公司的 部分意味着我们需要获得完整的公司列表。我们可以使用此查询执行此操作:

SELECT DISTINCT company FROM calls

全部放在一起:

SELECT

    -- The company
    companies.company

    -- How many non-first calls (gains) it has received
    ,(SELECT COUNT(DISTINCT non_first_calls.call_id) gains
        FROM calls AS non_first_calls
        RIGHT JOIN calls
            ON non_first_calls.customer = calls.customer
            AND non_first_calls.call_start >= calls.call_start
            AND non_first_calls.call_id > calls.call_id
        WHERE non_first_calls.company = companies.company
    ) gains

    -- How many non-last calls (losses) it has received    
    ,(SELECT COUNT(DISTINCT non_last_calls.call_id) gains
        FROM calls AS non_last_calls
        RIGHT JOIN calls
            ON non_last_calls.customer = calls.customer
            AND non_last_calls.call_start <= calls.call_start
            AND non_last_calls.call_id < calls.call_id
    WHERE non_last_calls.company = companies.company
    ) losses

-- From the set of all companies
FROM (SELECT DISTINCT company FROM calls) companies

关于效果

我不确定在处理大量数据时此查询的效率是否可以接受。

至少你需要(customercall_start)(按此顺序)和(company)上的另一个索引的组合索引。这是我在此查询上运行EXPLAIN后获得的输出,他提到了索引和您提供的示例数据。

Output of EXPLAIN