Question

我有一个数据框，包含大量由0,1和N组成的字符串。以下是一些例子：

a = "10000000011111111"
b = "11111111111111111"
c = "11111110000000NNN"
d = "00000000000000000"
e = "00000001111111111"
f = "11111000000000000"

我正在寻找一种识别字符串的方法，这些字符串只包含'0'和'1'而没有'N'存在。我的最终目标是在发生这种情况的地方替换我的原始数据框“REC”。与此question中的内容类似。

上述数据的结果将是：

a = "REC"
b = "11111111111111111"
c = "11111110000000NNN"
d = "00000000000000000"
e = "REC"
f = "REC"

我采用的主要策略（由前一个问题的回答指导）来实现我的目标使用gsub但是我无法得到一个适用于我想要的输出的正则表达式。我尝试了太多的迭代尝试这里，但这里是我最近的功能如下：

markREC <- function(X) {
 gsub(X, pattern = "^(0)+.*(1)+$", 
      replacement = "REC?")}

此功能将在lapply

的数据框上运行

我试过的另一个策略依赖于strsplit，但是我也无法让这个工作起作用。如果人们想看到它们，我可以举例说明。我想这对于那里的一些正则表达式专家来说很简单，但经过几个小时的努力，我很乐意帮助你！

Answer 1

嗯，我不确定你用正则表达式想要实现的目标。

^(0)+.*(1)+$

有效地意味着：

字符串的开头，匹配至少一个0后跟任何内容，后跟至少一个1和字符串的结尾。所以这个：032985472395871匹配:)）

^(?=.*0)(?=.*1)[01]+$仅在完整字符串包含0和1并且至少有一个0和至少一个1时匹配。

// ^(?=.*0)(?=.*1)[01]+$
// 
// Assert position at the beginning of the string «^»
// Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*0)»
//    Match any single character that is not a line break character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Match the character “0” literally «0»
// Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*1)»
//    Match any single character that is not a line break character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Match the character “1” literally «1»
// Match a single character present in the list “01” «[01]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Answer 2

要匹配仅包含0和1的字符串（而不是仅包含0或1的字符串），您可以执行以下操作：

grepl("^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", <string>)

对于您的一些示例：

> grepl("^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", a)
[1] TRUE

> grepl("^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", b)
[1] FALSE

> grepl("^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", c)
[1] FALSE

现在将其插入gsub：

> gsub(a, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "REC"

> gsub(b, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "11111111111111111"

> gsub(c, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "11111110000000NNN"

> gsub(d, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "00000000000000000"

> gsub(e, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "REC"

> gsub(f, pattern="^((0)+(1)+(0|1)+)|((1)+(0)+(0|1)+)$", replacement="REC")
[1] "REC"

Answer 3

正确的正则表达式是：

"[^N]*"

我相信。这将匹配任何长度的字符串，除非它包含N。

Answer 4

试试这个

^([01]*)[^01]+([01]*)$

匹配字符串的开头，后跟0或更多的0/1，然后是至少1个字符，不是0/1，后跟0或更多0/1（后跟字符串结尾）

使用字符串中的正则表达式进行模式匹配

4 个答案: