Question

具有这样的数据框：

data.frame(id = c(1,2,3,4), text = c("text, another, end","not, keep","not, to keep, this","finally, chance, to, check"))

如何检测每行文本列中的最后一个逗号并删除之前的所有内容。

预期输出示例：

data.frame(id = c(1,2,3,4), text = c("end","keep","this","check"))

Answer 1

使用sub，我们可以删除直至（包括）最后一个逗号的所有内容，以及该最后一个逗号之后的空白。

df$text <- sub("^.*,\\s*", "", df$text)
df

id  text
1  1   end
2  2  keep
3  3  this
4  4 check

数据：

df <- data.frame(id = c(1,2,3,4),
                 text = c("text, another, end","not, keep",
                          "not, to keep, this","finally, chance, to, check"))

Answer 2

正则表达式本质上是默认的贪婪，因此您实际上并不需要真正找到最后一个逗号

sub('.*, ', '', df$text)
#[1] "end"   "keep"  "this"  "check"

Answer 3

R，更冗长，效率更低，没有正则表达式：

df <- sapply(strsplit(as.character(df$text), ", "), function(x){x[length(x)]})

删除最后一个逗号之前的所有文本

3 个答案: