Question

我试图创建一个匹配字符串中所有相似单词/短语的模式。

例如，我需要匹配：＆＃34;这＆＃34;，＆＃34;这是＆＃34;，＆＃34;这是＆＃34;，＆＃34;＆＃34;，＆＃34;那是＆＃34;，＆＃34;那不是＆＃34;。

它仅匹配＆＃34;此＆＃34;的第一次出现，但它应匹配所有出现。

我甚至尝试过锚点和单词边界，但似乎没有任何效果。

我试过（简化）：

$content = "this is it! that was not!";

preg_match_all('/(this|this is|this is it|that|that was|that was not)/i', $content, $results);

应该输出：

此
这是
就是这样
那
那是
那不是

Answer 1

怎么样：

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))/i', $content, $results);
print_r($results);

根据评论进行编辑：

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);

<强>输出：

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => 
        )

    [2] => Array
        (
            [0] => this is
            [1] => 
        )

    [3] => Array
        (
            [0] => this is it
            [1] => 
        )

    [4] => Array
        (
            [0] => 
            [1] => that
        )

    [5] => Array
        (
            [0] => 
            [1] => that was
        )

    [6] => Array
        (
            [0] => 
            [1] => that was not
        )

)

更普遍：

$content = "this is it! that was not!";
preg_match_all('/\b(?=(\w+))(?=(\w+ \w+))(?=(\w+ \w+ \w+))\b/i', $content, $results);
print_r($results);

<强>输出：

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => that
        )

    [2] => Array
        (
            [0] => this is
            [1] => that was
        )

    [3] => Array
        (
            [0] => this is it
            [1] => that was not
        )

)

Answer 2

问题是最短字符串选项首先出现在或组中：

/(this|this is|this is it)/i

PHP将检查测试字符串是否包含从左到右的(this|this is|this is it) 项。一旦在测试字符串中找到匹配项，它就会离开该组。

这将起作用，因为PHP将首先搜索最长的字符串：

/(this is it|this is|this)/i

Demo

enter image description here

Answer 3

鉴于您只是捕获了您要搜索的字词，最好只使用foreach循环以及substr_count来查看每个字符的次数字符串出现。

例如：

$haystack = "this is it! that was not! this is not a test!";
$needles = array(
    "this",
    "this is",
    "this is it",
    "that",
    "that was",
    "that was not");

foreach ($needles as $needle) {
    // substr_count is case sensitive, so make subject and search lowercase
    $hits = substr_count(strtolower($haystack), strtolower($needle));

    echo "Search '$needle' occurs $hits time(s)" . PHP_EOL;
}

以上将输出：

Search 'this' occurs 2 time(s)
Search 'this is' occurs 2 time(s)
Search 'this is it' occurs 1 time(s)
Search 'that' occurs 1 time(s)
Search 'that was' occurs 1 time(s)
Search 'that was not' occurs 1 time(s)

如果substr_count没有提供您所需的灵活性，那么您始终可以使用preg_match_all替换它，并使用您的个人$needle值作为搜索字词。

Answer 4

您也可以使用以下正则表达式。

/(this(?:\sis(?:\sit)?)?)/i

需要使用preg_match_all匹配所有相似的单词/短语

4 个答案: