根据不同列中的值替换data.frame列中的值

时间:2015-04-22 19:24:33

标签: r dataframe

我有这个data.frame:

df <- data.frame(id = rep(c("one", "two", "three"), each = 10), week.born = NA)
df$week.born[c(5,15,28)] <- c(23,19,24)

df 

  id week.born
1    one        NA
2    one        NA
3    one        NA
4    one        NA
5    one        23
6    one        NA
7    one        NA
8    one        NA
9    one        NA
10   one        NA
11   two        NA
12   two        NA
13   two        NA
14   two        NA
15   two        19
16   two        NA
17   two        NA
18   two        NA
19   two        NA
20   two        NA
21 three        NA
22 three        NA
23 three        NA
24 three        NA
25 three        NA
26 three        NA
27 three        NA
28 three        24
29 three        NA
30 three        NA

对于one,所有week.born值应为23。 对于two,所有week.born值应为19。 对于one,所有week.born值都应为24

最好的办法是什么?

5 个答案:

答案 0 :(得分:5)

我会创建另一个包含映射的data.frame,然后进行简单的连接:

require(dplyr)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))
left_join(df, map, by="id")

# id week.born new.week.born
# 1    one        NA            23
# 2    one        NA            23
# ...
# 16   two        NA            19
# 17   two        NA            19
# 18   two        NA            19
# 19   two        NA            19
# 20   two        NA            19
# 21 three        NA            24
# 22 three        NA            24
# 23 three        NA            24
# ...

见下面的基准。

library(microbenchmark)
library(dplyr) # v 0.4.1
library(data.table) # v 1.9.5

df <- data.frame(id = rep(c("one", "two", "three"), each = 1e6))
df2 <- copy(df)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))

dplyr_join <- function() { 
  left_join(df, map, by="id")
}

r_merge <- function() {
  merge(df, map, by="id")
}

data.table_join <- function() {
  setkey(setDT(df2))[map]
}

Unit: milliseconds
              expr         min         lq       mean     median         uq       max neval
      dplyr_join()   409.10635   476.6690   910.6446   489.4573   705.4021  2866.151    10
         r_merge() 41589.32357 47376.0741 55719.1752 50133.0918 54636.3356 83562.931    10
 data.table_join()    94.14621   132.3788   483.4220   225.3309  1051.7916  1416.946    10

答案 1 :(得分:2)

一个解决方案是:

df$week.born[df$id == "one"] <- 23
df$week.born[df$id == "two"] <- 19
df$week.born[df$id == "three"] <- 24

此致

答案 2 :(得分:2)

你可以这样做:

library(data.table)
setDT(df)[,week.born:=week.born[!is.na(week.born)][1], by=id]

或使用R基础ave

df$week.born = with(df, ave(week.born, id, FUN=function(u) u[!is.na(u)][1]))

答案 3 :(得分:2)

如果您只有少数几个组,那么@ cho7tom就可以了,否则您可能更愿意拥有一个查找表并加入该表以根据PHP查找week.born值。

基础R

id

或使用df <- data.frame(id = rep(c("one", "two", "three"), each = 10)) lkp <- data.frame(id=c("one","two","three"), week.born=c(23,19,24)) merge(df, lkp, by="id")

中的二进制联接
data.table

答案 4 :(得分:0)

在映射这样的几个组合时,mapvalues包中的plyr函数很简单:

library(plyr)
df$week.born <- mapvalues(df$id, c("one", "two", "three"), c(23, 19, 24))