正则表达式替换链接无法按预期工作的文本

时间:2013-07-22 09:54:22

标签: php regex

所以我想替换<a>"strings"</a>的“字符串”,其中包含带字符串的数组和带有URL的另一个数组。一开始就像魅力一样,它取代了第一个巧合而不是标签&gt; &LT; ,但是当我开始用字符串和网址填充数组时,我发现如果它取代了“幻想游戏” - &gt; <a href="ASDFASDF">Fantasy games</a>然后它必须替换“幻想”它只是跳过&gt;字符串&lt;正确检查正则表达式并继续替换它,打破html代码并创建解析错误。

所以我假设我做错了什么或错过了参数或其他东西,因为如果内容有&gt;字符串&lt;它不会取代它,但如果我使用preg_replace,那就好像我做错了因为它没有检测到它像&gt; string&lt;当它要替换数组的下一个元素时。

以下是代码:

// DB content
// $Keywords=array("Fantasy games", "Fantasy");
// $URL=array("http://www.whatever.com", "http://www.whatever2.com");

$i=0;
// Insert the links and returns the processed content.
foreach ($SQLResult as $row){
    $Keywords[$i]="/[^>](".$row->Keyword.")[^<]/i";
    $URLS[$i]=' <a href="'.$row->URL.'">$1</a> ';
    $i++;
}   
$Content=preg_replace($Keywords, $URLS, $Content, 1);

1 个答案:

答案 0 :(得分:0)

我已经从这个问题的代码开始,正如@Jens指出的那样:https://stackoverflow.com/posts/4209925/edit

<?php

$dom = new DOMDocument();
// loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8"));

$xpath = new DOMXPath($dom);

foreach($xpath->query('//text()[not(ancestor::a)]') as $node)
{
    $i=0; 
    // Insert the links and returns the processed content.
    foreach ($SQLResult as $row){
        $replaced = str_ireplace($row->Keyword, '<a href="'.$row->URL.'">$0</a>', $node->wholeText);
        $newNode  = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }
}

// get only the body tag with its contents, then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");