我有两个数据集(每个人口一个:sellers
vs buyers
)。它们以相同的方式构建。
FOR BUYERS (TYPE 2)
period subject genderb gp matchp treatment type p1 p2 suminte partner
1 1 0 2 48 404 2 7 8 NA 4
1 3 1 2 48 404 2 7 8 NA 4
...
FOR SELLERS (TYPE 1)
period subject genders gp matchgp treatment type p1 p2 suminte partner
1 4 1 2 48 404 1 7 8 2 NA
...
然而sellers
数据中的观察结果较少,因为一个卖家可以与一个period
中的许多买家匹配(此处,卖家与2位买家互动)。在buyers
数据中,合作伙伴表示subject id
(卖家的列主题),而seller
数据suminte
表示卖家与之互动的买家数量。
我想要做的是:在数据集buyers
中,为每一行添加列genders
(表示卖方的性别),并将其与正确的买方匹配,在右侧{ {1}},在正确的组中,匹配具有合适价格的组...
我希望得到的结果如下:
period
如果我不够清楚,请告诉我......
答案 0 :(得分:0)
# example data
df1 = read.table(text = "
period subject genderb gp matchgp treatment type p1 p2 suminte partner
1 1 0 2 48 404 2 7 8 NA 4
1 3 1 2 48 404 2 7 8 NA 4
", header=T, stringsAsFactors=F)
df2 = read.table(text = "
period subject genders gp matchgp treatment type p1 p2 suminte partner
1 4 1 2 48 404 1 7 8 2 NA
", header=T, stringsAsFactors=F)
library(dplyr)
# remove columns that exist in df1 and you won't join on them
df2 = df2 %>% select(-treatment, -type, -suminte, -partner)
# join datasets using appropriate columns
left_join(df1, df2, by=c("period","gp","matchgp","p1","p2", "partner"="subject"))
# period subject genderb gp matchgp treatment type p1 p2 suminte partner genders
# 1 1 1 0 2 48 404 2 7 8 NA 4 1
# 2 1 3 1 2 48 404 2 7 8 NA 4 1