通过在另一列中查找匹配来填充缺失值

时间:2018-01-15 15:56:58

标签: r

enter image description here

# generate data
df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)
df$TakerId <- NA # a column of NAs

我不知道如何填充TakerID列,如上图所示。

变量QuestionIDAltQuestionID是相同的。 变量TakerIDAltTakerID也是相同的。

目的是将QuestionIDTakerID相关联。

想要输出:

> df
   QuestionId AltQuestionId AltTakerId TakerId
1          NA             1          7      NA
2          NA             2         13      NA
3          NA             4         10      NA
4          NA             5         15      NA
5          NA             6         17      NA
6           1            NA         NA       7
7          NA             8          8      NA
8          NA            10         11      NA
9           6            NA         NA      17
10          2            NA         NA      13
11         NA            14         25      NA
12          6            NA         NA      17
13         NA            16         29      NA
14          4            NA         NA      10
15         NA            18         35      NA
16         NA            20         29      NA

2 个答案:

答案 0 :(得分:1)

首先,杀死任何执行此操作的人!

现在,我不知道你想如何处理QuestionId(两个6)中有重复值的事实,但是如果

  • a)AltQuestionID中每个唯一QuestionId值的条目数与给定QuestionId值的实例一样多,或者
  • b)QuestionId中每个唯一AltQuestionId值只有一个实例,您希望TakerId复制AltTakerId重复的QuestionId,然后

这应该有效:

# generate data
df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)
df$TakerId <- NA # a column of NAs

# for each unique value of QuestionId (i) put the value of AltTakerId
# that corresponds to the row where AltQuestionId equals i into the row
# of TakerId for which QuestionId equals i
for (i in na.omit(unique(df$QuestionId))) {
  df$TakerId[which(df$QuestionId == i)] <- df$AltTakerId[which(df$AltQuestionId == i)]
}

这给出了:

> df
   QuestionId AltQuestionId AltTakerId TakerId
1          NA             1          7      NA
2          NA             2         13      NA
3          NA             4         10      NA
4          NA             5         15      NA
5          NA             6         17      NA
6           1            NA         NA       7
7          NA             8          8      NA
8          NA            10         11      NA
9           6            NA         NA      17
10          2            NA         NA      13
11         NA            14         25      NA
12          6            NA         NA      17
13         NA            16         29      NA
14          4            NA         NA      10
15         NA            18         35      NA
16         NA            20         29      NA

答案 1 :(得分:0)

这里是match的单行:

df$TakerId = df$AltTakerId[match(df$QuestionId, df$AltQuestionId)]
df
#    QuestionId AltQuestionId AltTakerId TakerId
# 1          NA             1          7      NA
# 2          NA             2         13      NA
# 3          NA             4         10      NA
# 4          NA             5         15      NA
# 5          NA             6         17      NA
# 6           1            NA         NA       7
# 7          NA             8          8      NA
# 8          NA            10         11      NA
# 9           6            NA         NA      17
# 10          2            NA         NA      13
# 11         NA            14         25      NA
# 12          6            NA         NA      17
# 13         NA            16         29      NA
# 14          4            NA         NA      10
# 15         NA            18         35      NA
# 16         NA            20         29      NA

使用米兰提供的精确数据:

df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)