Question

我继承了以不寻常的方式编码的数据集。我想学习一种不那么冗长的重塑方式。数据框如下所示：

# Input.
participant  = c(rep("John",6), rep("Mary",6))
day          = c(rep(1,3), rep(2,3), rep(1,3), rep(2,3))
likes        = c("apples", "apples", "18", "apples", "apples", "7", "bananas", "bananas", "24", "bananas", "bananas", "3")
question     = rep(c(1,1,0),4)
number       = c(rep(18,3), rep(7,3), rep(24,3), rep(3,3))
df           = data.frame(participant, day, question, likes)

   participant day question   likes
1         John   1        1  apples
2         John   1        1  apples
3         John   1        0      18
4         John   2        1  apples
5         John   2        1  apples
6         John   2        0       7
7         Mary   1        1 bananas
8         Mary   1        1 bananas
9         Mary   1        0      24
10        Mary   2        1 bananas
11        Mary   2        1 bananas
12        Mary   2        0       3

正如您所看到的，赞这一列是异构的。当问题等于0时，赞会传达参与者选择的数字，而不是他们喜欢的水果。所以我想在新专栏中重新编写代码，如下所示：

   participant day question   likes number
1         John   1        1  apples     18
2         John   1        1  apples     18
3         John   1        0      18     18
4         John   2        1  apples      7
5         John   2        1  apples      7
6         John   2        0       7      7
7         Mary   1        1 bananas     24
8         Mary   1        1 bananas     24
9         Mary   1        0      24     24
10        Mary   2        1 bananas      3
11        Mary   2        1 bananas      3
12        Mary   2        0       3      3

我目前使用基本R的解决方案包括对初始数据框进行子集化，创建查找表，更改列名，然后将查找表与原始数据帧合并。但这涉及几个步骤，我担心应该有一个更简单的解决方案。我认为tidyr可能是答案，但我不知道如何使用它在一列（喜欢）中将值传播到其他有条件的列（天< / strong>和问题）。

你有什么建议吗？非常感谢！

Answer 1

使用上面的数据集，您可以尝试以下方法。您按participant和day对数据进行分组，并为每个组查找question == 0行。

library(dplyr)
group_by(df, participant, day) %>%
mutate(age = as.numeric(as.character(likes[which(question == 0)])))

或者alistaire建议，您也可以使用grep()。

group_by(df, participant, day) %>%
mutate(age = as.numeric(grep('\\d+', likes, value = TRUE)))


#   participant   day question   likes   age
#        (fctr) (dbl)    (dbl)  (fctr) (dbl)
#1         John     1        1  apples    18
#2         John     1        1  apples    18
#3         John     1        0      18    18
#4         John     2        1  apples     7
#5         John     2        1  apples     7
#6         John     2        0       7     7
#7         Mary     1        1 bananas    24
#8         Mary     1        1 bananas    24
#9         Mary     1        0      24    24
#10        Mary     2        1 bananas     3
#11        Mary     2        1 bananas     3
#12        Mary     2        0       3     3

如果你想使用data.table，你可以这样做：

library(data.table)
setDT(df)[, age := as.numeric(as.character(likes[which(question == 0)])),
            by = list(participant, day)]

请注意

目前的数据集是一个新的数据集。 Jota的答案适用于已删除的数据集。

Answer 2

寻址新的示例数据：

# create a key column, overwrite it later
df$number <- paste0(df$participant, df$day) # use as a key
# create lookup table
lookup <- df[!is.na(as.numeric(as.character(df$likes))), c("number", "likes")]
# use lookup to overwrite df$number with the appropriate number
df$number <- lookup$likes[match(df$number, lookup$number)]
#   participant day question   likes number
#1         John   1        1  apples     18
#2         John   1        1  apples     18
#3         John   1        0      18     18
#4         John   2        1  apples      7
#5         John   2        1  apples      7
#6         John   2        0       7      7
#7         Mary   1        1 bananas     24
#8         Mary   1        1 bananas     24
#9         Mary   1        0      24     24
#10        Mary   2        1 bananas      3
#11        Mary   2        1 bananas      3
#12        Mary   2        0       3      3

由于将字符转换为数字（as.numeric(as.character(df$likes))），因此强制引入有关NAs的警告。

如果您按照示例中的顺序订购了数据，则可以使用na.locf包中的zoo：

library(zoo)
df$age <- na.locf(as.numeric(as.character(df$likes)), fromLast = TRUE)

如何在R中以另一行为条件填充一行的值？

2 个答案: