正则表达式按特定顺序排除单词

时间:2017-02-08 23:42:28

标签: r regex regex-negation

我想过滤掉某些字段,如果它们与标准不匹配。问题是他们的顺序。我尝试了以下结构:

(EXCLUDING)(?!\(MONDAY)(.*MONDAY).*

(EXCLUDING)(?!\()(.*MONDAY).*

我想要实现的是找到一个过滤器而不是捕获EXCLUDING * MONDAY但是如果这些单词之间有一个括号则不能。也就是说,我想抓住:

EXCLUDING MONDAY
EXCLUDING WEDNESDAY AND MONDAY
EXCLUDING MONDAY AND WEDNESDAY
EXCLUDING MONDAY (WEDNESDAY IS OK)

但不是

EXCLUDING WEDNESDAY (MONDAY IS OK)

上面的表达当然会抓住所有这些。它将在R中运行。

2 个答案:

答案 0 :(得分:1)

这是怎么回事?

mystrings <- c("EXCLUDING MONDAY",
"EXCLUDING WEDNESDAY AND MONDAY",
"EXCLUDING MONDAY AND WEDNESDAY",
"EXCLUDING MONDAY (WEDNESDAY IS OK)",
"EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING[^\\(]+MONDAY", mystrings)

> TRUE  TRUE  TRUE  TRUE FALSE

答案 1 :(得分:0)

如果您只想匹配在(之前不应发生MONDAY的模式,则可以使用负向反馈断言。你的正则表达式是负向前瞻,这就是为什么它对(MONDAY无法正常工作。

strs <- c("EXCLUDING MONDAY",
          "EXCLUDING WEDNESDAY AND MONDAY",
           "EXCLUDING MONDAY AND WEDNESDAY",
               "EXCLUDING MONDAY (WEDNESDAY IS OK)",
               "EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING.*(?<!\\()MONDAY", strs, perl=TRUE)
# [1]  TRUE  TRUE  TRUE  TRUE FALSE