Question

您知道到目前为止，对此已经有很多疑问。但是我已经尝试了很多，但是无法在我需要的地方得到它。

我需要一个正则表达式，它将从包含iframe的字符串中提取YouTube网址。

示例文字：

<p>

</p><p>Garbage text</p><p><iframe width="560" height="315" src="//www.youtube.com/embed/PZlJFGgFTfA" frameborder="0" allowfullscreen=""></iframe></p>

这是我想出的正则表达式：

(\bhttps?:)?\/\/[^,\s()<>]+(?:\([\w\d]+\)|(?:[^,[:punct:]\s]|\/))

Regex101 test

我在函数上使用它，它返回了一个空数组。有人知道我的功能出了什么问题吗？

function extractEmbedYT($str) {
    preg_match('/(\bhttps?:)?\/\/[^,\s()<>]+(?:\([\w\d]+\)|(?:[^,[:punct:]\s]|\/))/', $str, $matches, PREG_OFFSET_CAPTURE, 0);
    return $matches;
}

编辑1：更改了我的正则表达式中的捕获组，因此它不会捕获最后一个字符

编辑2：添加了一些放在上下文中的PHP代码，因为它可以在Regex101中使用，但不能在我的脚本中使用。

Answer 1

您需要将捕获组转换为非捕获组：

/(\bhttps?:)?\/\/[^,\s()<>]+(?:\(\w+\)|(?:[^,[:punct:]\s]|\/))/s
                                       ^^^

此外，在代码中，您需要将$string传递给函数，而不是$str：

function stripEmptyTags ($result)
{
    $regexps = array (
        '~<(\w+)\b[^\>]*>([\s]|&nbsp;)*</\\1>~',
        '~<\w+\s*/>~',
    );

    do
    {
        $string = $result;
        $result = preg_replace ($regexps, '', $string);
    }
    while ($result != $string);

    return $result;
}


function extractEmbedYT($str) {
    // Find all URLS in $str

    preg_match_all('/(\bhttps?:)?\/\/[^,\s()<>]+(?:\(\w+\)|(?:[^,[:punct:]\s]|\/))/s', $str, $matches);

    // Remove all iframes from $str
    $str = preg_replace('/<iframe.*?<\/iframe>/i','', $str);


    $str = stripEmptyTags($str);
    return [$str, $matches[0]];
}

$string = '<p>

</p><p>UDA Stagiaire</p><p><iframe width="560" height="315" src="//www.youtube.com/embed/PZlJFGgFTfA" frameborder="0" allowfullscreen=""></iframe></p>';

$results = extractEmbedYT($string);

print_r($results);

请参见online PHP demo。

正则表达式可从iframe中提取字符串中的YouTube嵌入网址

1 个答案: