链式记录 - 计算重复记录

时间:2017-05-19 11:47:19

标签: database oracle plsql

我有关于客户电话的记录,如

PHONENO        CALLTIME      REP
======== =================== ===
01555444 10.03.2017 10:30:00  N <- first occurence of 01555444
02888999 12.03.2017 11:40:20  N
01555444 15.03.2017 18:22:33  Y <- repeated 1st time 01555444
03666777 18.03.2017 20:36:44  N
01555444 19.03.2017 08:15:47  Y <- repeated 2nd time 01555444
01555444 30.03.2017 22:18:30  N <- first occurence of 01555444 (gap more than 10 days)

如果在前一次通话的下一个10内(来自同一个电话号码)发生呼叫,则会假定重复呼叫(在REP列中分配了&#39; Y&#39;)

现在我希望这样的表具有重复次数:

PHONENO        CALLTIME      REP REPNO
======== =================== === =====
01555444 10.03.2017 10:30:00  N    0
02888999 12.03.2017 11:40:20  N    0
01555444 15.03.2017 18:22:33  Y    1
03666777 18.03.2017 20:36:44  N    0
01555444 19.03.2017 08:15:47  Y    2
01555444 30.03.2017 22:18:30  N    0

REPNO表示(链接的)呼叫重复次数(10天内)。

如何计算?

1 个答案:

答案 0 :(得分:0)

这是一种使用tabibitosan方法识别重复行组的方法:

WITH cust_calls AS (SELECT '01555444' phoneno, to_date('10/03/2017 10:30:00', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '02888999' phoneno, to_date('12/03/2017 11:40:20', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('15/03/2017 18:22:33', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '03666777' phoneno, to_date('18/03/2017 20:36:44', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('19/03/2017 08:15:47', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('30/03/2017 22:18:30', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('30/04/2017 23:42:31', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('05/05/2017 16:35:41', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '01555444' phoneno, to_date('20/05/2017 21:20:52', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual UNION ALL
                    SELECT '02888999' phoneno, to_date('12/03/2017 11:45:20', 'dd/mm/yyyy hh24:mi:ss') calltime FROM dual),
  -- end of mimicking a table with your sample data in it. You do not need the above subquery, since you already have the table.
   initial_info AS (SELECT phoneno,
                           calltime,
                           CASE WHEN calltime - LAG(calltime) OVER (PARTITION BY phoneno ORDER BY calltime) <= 10 THEN 'Y' ELSE 'N' END rep_row
                    FROM   cust_calls),
    middle_info AS (SELECT phoneno,
                           calltime,
                           rep_row rep,
                           CASE WHEN rep_row = 'Y' THEN
                                     row_number() OVER (PARTITION BY phoneno ORDER BY calltime)
                                       - row_number() OVER (PARTITION BY phoneno, rep_row ORDER BY calltime)
                           END rep_grp
                    FROM   initial_info)
SELECT phoneno,
       calltime,
       rep,
       CASE WHEN rep_grp is not NULL THEN
                 row_number() OVER (PARTITION BY phoneno, rep_grp ORDER BY calltime)
       END repno
FROM   middle_info
ORDER BY phoneno, calltime;

PHONENO  CALLTIME            REP      REPNO
-------- ------------------- --- ----------
01555444 05/05/2017 16:35:41 Y            1
01555444 10/03/2017 10:30:00 N   
01555444 15/03/2017 18:22:33 Y            1
01555444 19/03/2017 08:15:47 Y            2
01555444 20/05/2017 21:20:52 N   
01555444 30/03/2017 22:18:30 N   
01555444 30/04/2017 23:42:31 N   
02888999 12/03/2017 11:40:20 N   
02888999 12/03/2017 11:45:20 Y            1
03666777 18/03/2017 20:36:44 N   

首先通过比较当前行的调用时间与前一行的调用时间并确定它是否在10天内来识别重复行。如果您已经拥有此信息,则可以跳过此步骤直接进入下一步。

接下来,我们使用tabibitosan方法比较相同phoneno的所有行的连续行以及rep_row为'Y'的所有行。

然后我们可以使用前一步骤的数字输出来进一步划分phoneno行,然后将row_number()分析函数应用于它。