不同的长度和合并

时间:2018-01-03 10:38:23

标签: r dataframe replace merge matching

我有两个数据集(每个人口一个:sellers vs buyers)。它们以相同的方式构建。

FOR BUYERS (TYPE 2)
period subject genderb gp matchp treatment type p1 p2 suminte partner
1      1         0      2    48     404      2   7  8    NA     4
1      3         1      2    48     404      2   7  8    NA     4
...

FOR SELLERS (TYPE 1)
period subject genders gp matchgp treatment type p1 p2 suminte partner
1       4        1      2    48     404       1   7  8    2     NA
...

然而sellers数据中的观察结果较少,因为一个卖家可以与一个period中的许多买家匹配(此处,卖家与2位买家互动)。在buyers数据中,合作伙伴表示subject id(卖家的列主题),而seller数据suminte表示卖家与之互动的买家数量。

我想要做的是:在数据集buyers中,为每一行添加列genders(表示卖方的性别),并将其与正确的买方匹配,在右侧{ {1}},在正确的组中,匹配具有合适价格的组...

我希望得到的结果如下:

period

如果我不够清楚,请告诉我......

1 个答案:

答案 0 :(得分:0)

# example data
df1 = read.table(text = "
                 period subject genderb gp matchgp treatment type p1 p2 suminte partner
                 1      1         0      2    48     404      2   7  8    NA     4
                 1      3         1      2    48     404      2   7  8    NA     4
                 ", header=T, stringsAsFactors=F)

df2 = read.table(text = "
                 period subject genders gp matchgp treatment type p1 p2 suminte partner
                 1       4        1      2    48     404       1   7  8    2     NA
                 ", header=T, stringsAsFactors=F)

library(dplyr)

# remove columns that exist in df1 and you won't join on them
df2 = df2 %>% select(-treatment, -type, -suminte, -partner)

# join datasets using appropriate columns
left_join(df1, df2, by=c("period","gp","matchgp","p1","p2", "partner"="subject"))

#   period subject genderb gp matchgp treatment type p1 p2 suminte partner genders
# 1      1       1       0  2      48       404    2  7  8      NA       4       1
# 2      1       3       1  2      48       404    2  7  8      NA       4       1