替换忽略HTML标记的文本

时间:2012-02-22 22:54:28

标签: php regex preg-replace

我有一个带HTML标签的简单文字,例如:

Once <u>the</u> activity <a href="#">reaches</a> the resumed state, you can freely add and remove fragments to the activity. Thus, <i>only</i> while the activity is in the resumed state can the <b>lifecycle</b> of a <hr/> fragment change independently.

当我执行此替换时,我需要替换此文本的某些部分而忽略其html标记,例如此字符串 - Thus, <i>only</i> while我需要替换为我的字符串Hello, <i>its only</i> while。要替换的文本和字符串是动态的。我需要你的preg_replace模式的帮助

$text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

$arrayKeys= array('Some html' => 'My html', 'and there' => 'is there', 'in this text' => 'in this code');

foreach ($arrayKeys as $key => $value)
    $text = preg_replace('...$key...', '...$value...', $text);

echo $text; // output should be: <b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code';

请帮我找到解决方案。谢谢

2 个答案:

答案 0 :(得分:1)

基本上我们将使用Regex从纯文本构建匹配和模式的动态数组。此代码仅匹配最初要求的内容,但您应该能够了解如何从我拼写出来的方式编辑代码。我们捕获开放或关闭标记和空白作为passthru变量并替换它周围的文本。这是基于两个和三个单词组合设置的。

<?php

    $text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';

    $arrayKeys= array(
    'Some html' => 'My html',
    'and there' => 'is there',
    'in this text' =>'in this code');


    function make_pattern($string){
        $patterns = array(
                      '!(\w+)!i',
                      '#^#',
                      '! !',
                      '#$#');
        $replacements = array(
                      "($1)",
                      '!',
                //This next line is where we capture the possible tag or
                //whitespace so we can ignore it and pass it through.
                      '(\s?<?/?[^>]*>?\s?)',
                      '!i');
        $new_string = preg_replace($patterns,$replacements,$string);
        return $new_string;
    }

    function make_replacement($replacement){
        $patterns = array(
                      '!^(\w+)(\s+)(\w+)(\s+)(\w+)$!',
                      '!^(\w+)(\s+)(\w+)$!');
        $replacements = array(
                       '$1\$2$3\$4$5',
                       '$1\$2$3');
        $new_replacement = preg_replace($patterns,$replacements,$replacement);
        return $new_replacement;
    }


    foreach ($arrayKeys as $key => $value){
        $new_Patterns[] = make_pattern($key);
        $new_Replacements[] = make_replacement($value);
    }

    //For debugging
    //print_r($new_Patterns);
    //print_r($new_Replacements);

    $new_text = preg_replace($new_Patterns,$new_Replacements,$text);

    echo $new_text."\n";
    echo $text;


?>

<强>输出

<b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code
<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text

答案 1 :(得分:0)

我们走了。假设您只尊重twp约束,那么这段代码应该可以工作:

  • 图案和替换必须具有相同数量的单词。 (逻辑,因为你想保持位置)
  • 您不得在标签周围分割单词。 (&lt; b&gt; Hel&lt; / b&gt; lo世界将无效。)

但如果这些得到尊重,这应该可以正常工作!

<?php
    // Splits a string in parts delimited with the sequence.
    // '<b>Hey</b> you' becomes '~-=<b>~-=Hey~-=</b>~-= you' that make us get
    // array ("<b>", "Hey" " you")
    function getTextArray ($text, $special) {
        $text = preg_replace ('#(<.*>)#isU', $special . '$1' . $special, $text); // Adding spaces to make explode work fine.

        return preg_split ('#' . $special . '#', $text, -1, PREG_SPLIT_NO_EMPTY);
    }
        $text = "
    <html>
        <div>
            <p>
                <b>Hey</b> you ! No, you don't have <em>to</em> go!
            </p>
        </div>
    </html>";

    $replacement = array (
        "Hey you" => "Bye me",
        "have to" => "need to",
        "to go" => "to run");

    // This is a special sequence that you must be sure to find nowhere in your code. It is used to split sequences, and will disappear.
    $special = '~-=';

    $text_array = getTextArray ($text, $special);

    // $restore is the array that will finally contain the result.
    // Now we're only storing the tags.
    // We'll be story the text later.
    //
    // $clean_text is the text without the tags, but with the special sequence instead.
    $restore = array ();
    for ($i = 0; $i < sizeof ($text_array); $i++) {
        $str = $text_array[$i];

        if (preg_match('#<.+>#', $str)) {
            $restore[$i] = $str;
            $clean_text .= $special;
        }

        else {
            $clean_text .= $str;
        }
    }

    // Here comes the tricky part.
    // We wanna keep the position of each part of the text so the tags don't
    // move after.
    // So we're making the regex look like (~-=)*Hey(~-=)* you(~-=)*
    // And the replacement look like $1Bye$2 me $3.
    // So that we keep the separators at the right place.
    foreach ($replacement as $regex => $newstr) {
        $regex_array = explode (' ', $regex);
        $regex = '(' . $special . '*)' . implode ('(' . $special . '*) ', $regex_array) . '(' . $special . '*)';

        $newstr_array = explode (' ', $newstr);
        $newstr = "$1";

        for ($i = 0; $i < count ($regex_array) - 1; $i++) {
            $newstr .= $newstr_array[$i] . '$' . ($i + 2) . ' ';
        }
        $newstr .= $newstr_array[count($regex_array) - 1] . '$' . (count ($regex_array) + 1);

        $clean_text = preg_replace ('#' . $regex . '#isU', $newstr, $clean_text);
    }

    // Here we re-split one last time.
    $clean_text_array = preg_split ('#' . $special . '#', $clean_text, -1, PREG_SPLIT_NO_EMPTY);

    // And we merge with $restore.
    for ($i = 0, $j = 0; $i < count ($text_array); $i++) {
        if (!isset($restore[$i])) {
            $restore[$i] = $clean_text_array[$j];
            $j++;
        }
    }

    // Now we reorder everything, and make it go back to a string.
    ksort ($restore);
    $result = implode ($restore);

    echo $result;
?>

输出再见我!不,您不需要 运行!

[编辑]现在支持自定义模式,可以避免添加无用空格。