Question

我必须分析经济实验的数据。我的数据库由14 976个观察结果和212个变量组成。在这个数据库中，我们有其他信息，如利润，总利润，治疗和其他变量。你可以看到我有两种类型：

类型1适用于卖家
类型2适用于买家

对于某些变量，结果存储在买方（类型2）行中，而不是卖方（这是完全任意选择的选择）。但是，我想分析多收费用的卖家的性别（例如）。所以我需要操纵我的数据库，我不知道如何做到这一点。

在这里，您拥有数据库的一部分：

ID       Gender   Period   Matching group   Group    Type  Overcharging ...
654        1           1            73         1        1      NA
654        1           2            73         1        1      NA
654        1           3            73         1        1      NA
654        1           4            73         1        1      NA 
435        1           1            73         2        1      NA
435        1           2            73         2        1      NA
435        1           3            73         2        1      NA
435        1           4            73         2        1      NA 
708        0           1            73         1        2       1
708        0           2            73         1        2       0
708        0           3            73         1        2       0
708        0           4            73         1        2       1   
546        1           1            73         2        2       0
546        1           2            73         2        2       0
546        1           3            73         2        2       1
546        1           4            73         2        2       0

为了做我想做的事情，我有很多信息（在x期间，x组，匹配组x，治疗x ......中只有一个卖家与一个买家匹配）。举个例子，在匹配组73中，我们知道在第1期，主题708被过度收费（第1组中的一个）。据我所知，这些人属于第1组和匹配组73，我能够识别出在第1期多收他的卖家：主题654，性别= 1。

所以，我想在卖家行（类型== 1）上过度收费（和其他一些）买家价值来分析卖家的行为，但在正确的时间段，对于正确的群体和正确的匹配群体。

Answer 1

早上好：）

我在使用data.frames方面做了很多工作。如果您希望长期使用R代码，我建议您查看（i）dplyr软件包，tidyverse套件的一部分或（ii）data.table软件包。第一个具有最流行的语法，并与一堆有用的包很好地结合在一起。第二个更难学，但更快。对于您的尺寸数据，这可以忽略不计。

在基础数据框架中，我希望这是与您的请求相匹配的内容。如果我误解了任何事情或者不清楚，请告诉我。

# sellers data eg
dt1 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 1, 
                 Overcharging = NA)
# buyers data eg
dt2 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 2, 
                 Overcharging = c(1,0,0,1))
# make my current data view
dt <- rbind(dt1, dt2)
dt[]

# split in to two data frames, on the Type column:
dt_split <- split(dt, dt$Type)
dt_split

# move out of list
dt_suffix <- seq_along(dt_split)
dt_names <- sprintf("dt%s", dt_suffix)
for(name in dt_names){
  assign(name, dt_split[match(name, dt_names)][[1]])
}
dt1[]
dt2[]

# define the columns in which to match up the buyer to seller
merge_cols <- c("Period", "MatchGroup", "Group")
# define the columns you want to merge, that you know are NA
na_cols <- c("Overcharging")
# now use merge operation, and filter dt2, to pull in only columns you want
# I suggest dropping the na_cols first in dt1, as otherwise it will create two 
# columns post-merge: Overcharging, i.Overcharging
dt1 <- dt1[,setdiff(names(dt1), na_cols)]
dt1_new <- merge(dt1, 
                 dt2[, c(merge_cols, na_cols)], # filter dt2 
                 by = merge_cols, # columns to match on
                 all.x = TRUE) # dt1 is x, dt2 is y. Want to keep all of dt1

# if you want to bind them back together, ensure the column order matches, and
# bind e.g.
dt1_new <- dt1_new[, names(dt2)]
dt_final <- rbind(dt1_new, dt2)
dt_final[]

我的想法是让这些买家和卖家将数据框架分成两个独立的框架。然后确定他们如何加入，并将您需要的数据从买家迁移到卖家。如果需要，最后将它们重新组合在一起。

匹配列和行然后替换

1 个答案: