Question

我目前在R中有一个字符串，如下所示：

df <- c ("BMMBMMBMMMMMBMMBM")

我需要确定MM在此字符串中出现的次数（在此示例中为4）。

我一直在使用str_count(df, "MM")，但这只计算两个M在字符串中彼此相邻的次数（返回5）。

任何帮助都会很棒......

谢谢！

Answer 1

可能的方法是：

<link rel="stylesheet" type="text/css" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css">
<table class="table table-bordered">
<thead>
  <th>Select</th>
  <th>Name</th>
</thead>
<tr>
  <td>
<div class="custom-control custom-radio">
  <input type="radio" id="customRadio1" name="customRadio" class="custom-control-input">
  <label class="custom-control-label" for="customRadio1"></label>
</div>
  </td>
  <td>Sample option 1</td>
</tr>
<tr>
  <td>
<div class="custom-control custom-radio">
  <input type="radio" id="customRadio1" name="customRadio" class="custom-control-input">
  <label class="custom-control-label" for="customRadio1">with label</label>
</div>
  </td>
  <td>Sample option 2</td>
</tr>
</table>

stringr::str_count(df, "MM+") #output [1] 4表示一个或多个

在基地R：

lengths(gregexpr("MM+", df))返回一个列表，每个元素对应一个gregexpr元素。 df返回每个列表元素的长度。

编辑：根据@docendo discimus的评论，第二个选项有点危险，因为如果找不到字符串，它将返回lengths。

更安全的选择是：

lengths(gregexpr("xyz+", df))
#output
1

Answer 2

这是一个没有正则表达式的基本R方法：

with(rle(unlist(strsplit(x, ""))), sum(values == "M" & lengths >= 2))
# [1] 4

Answer 3

基础解决方案：

s <- "BMMBMMBMMMMMBMMBM"
lengths(gregexpr("MM+", s))
## [1] 4

请注意，问题中名为df的输入是字符串，而不是数据框，c("X")与"X"相同，因此c和括号不需要。

Answer 4

尝试以下模式：

str_count(df,"(M)\\1+")

这将计算两个或更多M作为一个案例。或

str_count(df,"M{2,}")

R：计算字符串中的连续字母

4 个答案: