忽略preg_replace中的img标记

时间:2015-04-11 16:10:15

标签: php regex html-parsing preg-replace

我想替换HTML字符串中的单词,但如果单词属于'img'元素的属性,我想排除替换。

示例:

$word = 'google';
$html = 'I like google and here is its logo <img src="images/google.png" alt="Image of google logo" />';

$replacement = '<a href="http://google.com">Google</a>';
$result =  preg_replace('/\s'.($word).'/u', $replacement, $html);

preg_replace也会替换'src'和'alt'属性中的“google”字样,我希望它只是替换'img'元素之外的字。

2 个答案:

答案 0 :(得分:4)

您可以使用丢弃模式。例如,您可以使用这样的正则表达式:

<.*?google.*?\/>(*SKIP)(*FAIL)|google

<强> Working demo

enter image description here

这种模式背后的想法是放弃google<内的>字,但保留其余部分:

<.*?google.*?\/>(*SKIP)(*FAIL)  --> This part will skip the matches where google is within <...>
|google                         --> but will keep the others google

您可以添加许多“丢弃”模式,例如:

discard patt1(*SKIP)(*FAIL)|discard patt(*SKIP)(*FAIL)|...(*SKIP)(*FAIL)|keep this

答案 1 :(得分:0)

使用正向前瞻(?=.*?<.*?/>)

$html = 'I like google and here is its logo <img src="images/google.png" alt="Image of google logo" />';

$result = preg_replace('%(?=.*?<.*?/>)google%im', 'ANOTHER WORD', $html);

DEMO

<强>说明

(?=.*?<.*?/>)google
-------------------

Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*?<.*?/>)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “<” literally «<»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character string “/>” literally «/>»
Match the character string “google” literally (case insensitive) «google»

ANOTHER WORD

Insert the character string “ANOTHER WORD” literally «ANOTHER WORD»

有关Regex Lookaround

的更多信息