需要使用preg_match_all匹配所有相似的单词/短语

时间:2014-11-14 13:19:19

标签: php regex web preg-match preg-match-all

我试图创建一个匹配字符串中所有相似单词/短语的模式。

例如,我需要匹配:"这","这是","这是","", "那是","那不是"。

它仅匹配"此"的第一次出现,但它应匹配所有出现。

我甚至尝试过锚点和单词边界,但似乎没有任何效果。

我试过(简化):

$content = "this is it! that was not!";

preg_match_all('/(this|this is|this is it|that|that was|that was not)/i', $content, $results);

应该输出:

  • 这是
  • 就是这样
  • 那是
  • 那不是

4 个答案:

答案 0 :(得分:1)

怎么样:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))/i', $content, $results);
print_r($results);

根据评论进行编辑:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);

<强>输出:

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => 
        )

    [2] => Array
        (
            [0] => this is
            [1] => 
        )

    [3] => Array
        (
            [0] => this is it
            [1] => 
        )

    [4] => Array
        (
            [0] => 
            [1] => that
        )

    [5] => Array
        (
            [0] => 
            [1] => that was
        )

    [6] => Array
        (
            [0] => 
            [1] => that was not
        )

)

更普遍:

$content = "this is it! that was not!";
preg_match_all('/\b(?=(\w+))(?=(\w+ \w+))(?=(\w+ \w+ \w+))\b/i', $content, $results);
print_r($results);

<强>输出:

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => that
        )

    [2] => Array
        (
            [0] => this is
            [1] => that was
        )

    [3] => Array
        (
            [0] => this is it
            [1] => that was not
        )

)

答案 1 :(得分:1)

问题是最短字符串选项首先出现在或组中:

/(this|this is|this is it)/i

PHP将检查测试字符串是否包含从左到右的(this|this is|this is it) 。一旦在测试字符串中找到匹配项,它就会离开该组。

这将起作用,因为PHP将首先搜索最长的字符串:

/(this is it|this is|this)/i

Demo

enter image description here

答案 2 :(得分:1)

鉴于您只是捕获了您要搜索的字词,最好只使用foreach循环以及substr_count来查看每个字符的次数字符串出现。

例如:

$haystack = "this is it! that was not! this is not a test!";
$needles = array(
    "this",
    "this is",
    "this is it",
    "that",
    "that was",
    "that was not");

foreach ($needles as $needle) {
    // substr_count is case sensitive, so make subject and search lowercase
    $hits = substr_count(strtolower($haystack), strtolower($needle));

    echo "Search '$needle' occurs $hits time(s)" . PHP_EOL;
}

以上将输出:

Search 'this' occurs 2 time(s)
Search 'this is' occurs 2 time(s)
Search 'this is it' occurs 1 time(s)
Search 'that' occurs 1 time(s)
Search 'that was' occurs 1 time(s)
Search 'that was not' occurs 1 time(s)

如果substr_count没有提供您所需的灵活性,那么您始终可以使用preg_match_all替换它,并使用您的个人$needle值作为搜索字词。

答案 3 :(得分:0)

您也可以使用以下正则表达式。

/(this(?:\sis(?:\sit)?)?)/i
相关问题