基于部分匹配替换数据帧中的值

时间:2018-04-10 03:12:22

标签: r grep dplyr substitution startswith

这是我的数据

> df1
        col1      col2
1  0/0:6:6,0 0/0:6:6,0
2  0/0:6:6,0 0/1:6:6,0
...
6  1/1:6:6,0 0/0:6:6,0
7  0/0:8:8,0 0/0:8:8,0

我想要的是用“0/0:6:6,0”之类的长条目替换为0,如果它们以“0/0”开头,则为0.5,如果它们以“0/1”开始等,则为0.5。

到目前为止,我已经尝试过这个:

1)replace-starts_with

df %>% mutate(col1 = replace(col1, starts_with("0/0"), 0)) %>% head()
    Error in mutate_impl(.data, dots) : 
      Evaluation error: Variable context not set.
    In addition: Warning message:
    In `[<-.factor`(`*tmp*`, list, value = 0) :
      invalid factor level, NA generated

2)grep(在此处将其视为解决方案)

df[,1][grep("0/1",df[,1])]<-0.5
Warning message:
In `[<-.factor`(`*tmp*`, grep("0/1", df[, 1]), value = c(NA, 2L,  :
  invalid factor level, NA generated

感到失落......这是漫长的一天

1 个答案:

答案 0 :(得分:2)

我们可以使用grepl

df1 %>%
   mutate(col1 = replace(col1, grepl("^0/0", col1), 0))
#       col1      col2
#1         0 0/0:6:6,0
#2         0 0/1:6:6,0
#3 1/1:6:6,0 0/0:6:6,0
#4         0 0/0:8:8,0

或使用startsWith

中的base R
df1 %>%
    mutate(col1 = replace(col1, startsWith(col1, "0/0"), 0))

dplyr::starts_with的问题在于它是基于名称的select变量的辅助函数

df1 %>%
    select(starts_with('col1'))
#       col1
#1 0/0:6:6,0
#2 0/0:6:6,0
#6 1/1:6:6,0
#7 0/0:8:8,0

而不是变量的值,而startsWithlogical向量返回为grepl

startsWith(df1$col1, "0/0")
#[1]  TRUE  TRUE FALSE  TRUE