多对多加入(具有不同日期的相同ID)

时间:2015-11-06 20:18:53

标签: sql r many-to-many match seq

我正在使用SQL和R进行分析,我想加入两个表,如下所示:

表1:

ID  date
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310

表2

ID  date
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310

情况如下:客户被叫(table1),客户回电了解更多信息(表二)

所以我想在分析中做的是:

  1. 仅显示两个表中的ID。
  2. 将表2日期与表1日期匹配:
    • 匹配最近的日期
    • 表2日期必须> =表1日期 (如结果中的示例“a66”20150310分配给table1日期20150310,而20150308分配给20150308,而不是20150309)
  3. 结果:

    ID  table1 date table2 date
    a11 20150302    
    a11 20150302    20150303
    a22 20150303    20150304
    a22 20150304    20150305
    a44 20150306    20150306
    a66 20150308    20150308
    a66 20150309    
    a66 20150310    20150310
    

    对于这个多对多(但我不希望n * m作为结果,我想要1对1)匹配/加入是否有任何解决方案?将需要R或SQL中的解决方案。

    由于

3 个答案:

答案 0 :(得分:1)

ID

加入Table 2上的两个表,然后移除Table 1中不在ROW_NUMBER() OVER (PARTITION BY ID, Date1 ORDER BY Date2 ASC)中的行。然后使用WHERE RowNumber = 1匹配+-----+----------+----------+ | ID | Date1 | Date2 | +-----+----------+----------+ | a11 | 20150302 | 20150303 | | a22 | 20150303 | 20150304 | | a22 | 20150304 | 20150304 | | a44 | 20150306 | 20150306 | | a66 | 20150308 | 20150308 | | a66 | 20150309 | 20150310 | | a66 | 20150310 | 20150310 | +-----+----------+----------+ 子句找到的最近日期。

生成与您列出的条件一致的输出:

#ifndef Globals_h
#define Globals_h

#endif /* Globals_h */

extern NSArray *CompetencyOne;
extern NSArray *CompetencyTwo;
extern NSArray *CompetencyThree;
extern NSArray *CompetencyFour;
extern NSArray *CompetencyFive;
extern NSArray *CompetencySix;
extern NSArray *CompetencySeven;
extern NSArray *CompetencyEight;
extern NSArray *CompetencyNine;
extern NSArray *CompetencyTen;
extern NSArray *CompetencyEleven;
extern NSArray *CompetencyTwelve;
extern NSArray *Competencies;

答案 1 :(得分:1)

我在R中使用dplyr获得与markmanguy相同的结果。对于a22,20150304初始通话的最接近回调是20150304,而不是20150305.您需要一个时间组件来区分这一点。

library(dplyr)
inner_join(table1,table2,"ID")%>%
group_by(ID,date1)%>%
filter(date1<=date2)%>%
filter(row_number() == 1)

>
Source: local data frame [7 x 3]
Groups: ID, date1 [7]

     ID    date1    date2
  (chr)    (int)    (int)
1   a11 20150302 20150303
2   a22 20150303 20150304
3   a22 20150304 20150304
4   a44 20150306 20150306
5   a66 20150308 20150308
6   a66 20150309 20150310
7   a66 20150310 20150310

数据

table1 <-read.table(text="ID  date1
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310", header=T,stringsAsFactors =F)
table2 <-read.table(text="ID  date2
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310", header=T,stringsAsFactors =F)

答案 2 :(得分:1)

这不解决它但是很接近并且可能会给你一个想法

<强> SqlFiddleDemo

With t_left as (
    SELECT *, row_number() over (partition by "ID" order by date desc ) as rn
    FROM Table1 T
    WHERE EXISTS (SELECT 1 FROM Table2 P WHERE T."ID" = P."ID")
),
t_right as (
    SELECT *, row_number() over (partition by "ID" order by date desc) as rn
    FROM Table2
) 
SELECT t_left."ID", t_left."date", t_right."date"
FROM t_left
LEFT JOIN t_right
       on t_left.rn = t_right.rn
      and t_left."ID" = t_right."ID"
ORDER BY t_left."ID", t_left."date"

<强>输出

|  ID |     date |     date |
|-----|----------|----------|
| a11 | 20150302 | 20150303 |
| a11 | 20150302 |   (null) |
| a22 | 20150303 | 20150304 |
| a22 | 20150304 | 20150305 |
| a44 | 20150306 | 20150306 |
| a66 | 20150308 |   (null) |
| a66 | 20150309 | 20150308 |
| a66 | 20150310 | 20150310 |