RegEx for replacing part of a string in R

时间:2019-05-26 22:12:51

标签: r regex gsub regex-lookarounds regex-group

I am trying to do an exact pattern match using the gsub/sub and replace function. I am not getting the desired response. I am trying to remove the .x and .y from the names without affecting other names.

name = c("company", "deriv.x", "isConfirmed.y")
new.name = gsub(".x$|.y$", "", name)
new.name
[1] "compa"       "deriv"       "isConfirmed"

company has become compa.

I have also tried

remove = c(".x", ".y")
replace(name, name %in% remove, "")
[1] "company"    "deriv.x"    "isConfirmed.y"

I would like the outcome to be. "company", "deriv", "isConfirmed"

How do I solve this problem?

3 个答案:

答案 0 :(得分:1)

在这里,我们可以使用一个简单的表达式来删除不需要的.及其后的所有内容:

(.+?)(?:\..+)?

或完全匹配:

(.+?)(?:\.x|\.y)?

R测试

您的代码可能类似于:

gsub("(.+?)(?:\\..+)?", "\\1", "deriv.x")

gsub("(.+?)(?:\.x|\.y)?", "\\1", "deriv.x")

R Demo

RegEx Demo 1

RegEx Demo 2

说明

在这里,我们有一个捕获组(.+?),在这里我们需要输出,而一个非捕获组(?:\..+)?在不需要的.之后刷了所有东西。

答案 1 :(得分:1)

在正则表达式中,.表示“任何字符”。为了识别文字.字符,您需要对字符进行转义,如下所示:

name <- c("company", "deriv.x", "isConfirmed.y")
new.name <- gsub("\\.x$|\\.y$", "", name)
new.name

[1] "company"     "deriv"       "isConfirmed"

这解释了为什么在您的原始示例中,“ company”被转换为“ compa”(删除了“ n”的任何字符,后跟“ y”和字符串的结尾”)。

Onyambu的评论也将起作用,因为在正则表达式的[ ]部分中,.是按字面意义解释的。

gsub("[.](x|y)$", "", name)

答案 2 :(得分:1)

点与除换行符之外的任何字符匹配.x$|.y$也将与ny中的company匹配

不需要任何分组结构来匹配后跟x或y的点。您可以匹配一个点,并使用character class匹配x或y:

\\.[xy]

Regex demo | R demo

并替换为空字符串:

name = c("company", "deriv.x", "isConfirmed.y")
new.name = gsub("\\.[xy]", "", name)
new.name

结果

[1] "company"     "deriv"       "isConfirmed"