PHP - 在字符串</p> </div>中的其他标记(<p>)之间插入标记(<div>)

时间:2015-04-08 13:54:55

标签: php string dom tags

我在php中有一个字符串,我从请求中获取(事实上,它是来自CKEDITOR WYSIWYG文本编辑器的字符串),我试图在其他标记中插入标记(div)({{ 1}})并且还从p&gt;获取数据属性。 p之前。

最好通过这个例子来理解:

div

在这里,第一个和最后一个$String = <p> <div class="ST" data-start="1" data-end="5"> <span>Blabla1 </span><span>Blabla2</span> </div> </p> <p> Blabla3 Blabla4 </p> <p> <div class="ST" data-start="6" data-end="10"> <span>Blabla10 </span><span>Blabla20</span> </div> </p> 都可以!但我想要的是第二个<p>

我需要在<p>中添加“Blabla3 Blabla4”,其中<div class="ST">data-start属性来自之前的data-end(此处<div>data-start = 0并最终得到这个:

data-end = 5

字符串也可以是这样的(开头是<p> <div class="ST" data-start="1" data-end="5"> <span>Blabla1 </span><span>Blabla2</span> </div> </p> <p> <div class="ST" data-start="1" data-end="5"> Blabla3 Blabla4 </div> </p> <p> <div class="ST" data-start="6" data-end="10"> <span>Blabla10 </span><span>Blabla20</span> </div> </p> )在这种情况下,将<p>data-start放到data-end

0

或者像这样(有两个或更多<p> Blabla3 Blabla4 </p> <p> <div data-start="0" data-end="5"> <span>Blabla1 </span><span>Blabla2</span> </div> </p> <p> <div data-start="6" data-end="10"> <span>Blabla10 </span><span>Blabla20</span> </div> </p> )在这种情况下,<p>data-start同时放入data-end1,如上所述:

5

我不知道如何操纵字符串......可能正在使用正则表达式吗?

感谢您的帮助!

编辑1

我试过了:

<p> <div data-start="1" data-end="5"> <span>Blabla1 </span><span>Blabla2</span> </div> </p> <p> Blabla3 Blabla4 </p> <p> Blabla5 Blabla6 </p> <p> <div data-start="6" data-end="10"> <span>Blabla10 </span><span>Blabla20</span> </div> </p> =

$value

我的代码(我正在使用symfony2和Transformer):

string 
'<p><show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1&nbsp; </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show></p>

    <p>TESTTTT</p>' (length=709)

这是我的var_dumps: 1 /     的var_dump($ DOM);

public function reverseTransform($value)
{
        $value_purified = strip_tags($value, '<p><show><strong><span><word><em><u>'); // Allow just tags bellow

        // Create a DOM with $value
        $dom = new DOMDocument();
        $dom->preserveWhiteSpace = false;
        $dom->formatOutput = true;
        libxml_use_internal_errors(true); // autorise les balises non conforme html5
        $dom->loadHTML($value_purified); // Charge le string $value dans le DOM $dom
        libxml_use_internal_errors(false); // refuse les balises non conforme html5

        var_dump($dom);

        $pTags = $dom->getElementsByTagName('p');
        var_dump($pTags); 

        foreach ($pTags as $pTag) {
            var_dump($pTag);
            $valuePTagFull = $this->DOMinnerHTML($pTag);
            if (strpos($valuePTagFull,'<show') === false) {
                $valuePTagFull = "<show class='st'>".$valuePTagFull."</show>";
            } 
            var_dump($valuePTagFull);
        }

        $value_purified = strip_tags($value, '<show><strong><span><word><em><u>'); // Allow tags bellow (delete the <p> tag)
        var_dump($value_purified);
}

private function DOMinnerHTML(DOMNode $element)
{
    $innerHTML = "";
    $children = $element->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $element->ownerDocument->saveHTML($child);
    }
    return $innerHTML;
}

2 /这里没关系,因为在我的字符串中我有2个object(DOMDocument)[1000] public 'doctype' => string '(object value omitted)' (length=22) public 'implementation' => string '(object value omitted)' (length=22) public 'documentElement' => string '(object value omitted)' (length=22) public 'actualEncoding' => null public 'encoding' => null public 'xmlEncoding' => null public 'standalone' => boolean true public 'xmlStandalone' => boolean true public 'version' => null public 'xmlVersion' => null public 'strictErrorChecking' => boolean true public 'documentURI' => null public 'config' => null public 'formatOutput' => boolean true public 'validateOnParse' => boolean false public 'resolveExternals' => boolean false public 'preserveWhiteSpace' => boolean false public 'recover' => boolean false public 'substituteEntities' => boolean false public 'nodeName' => string '#document' (length=9) public 'nodeValue' => null public 'nodeType' => int 13 public 'parentNode' => null public 'childNodes' => string '(object value omitted)' (length=22) public 'firstChild' => string '(object value omitted)' (length=22) public 'lastChild' => string '(object value omitted)' (length=22) public 'previousSibling' => null public 'attributes' => null public 'ownerDocument' => null public 'namespaceURI' => null public 'prefix' => string '' (length=0) public 'localName' => null public 'baseURI' => null public 'textContent' => string 'TEST1 TEST2. TEST3 TESTTTT' (length=32) 标记,而<p>则返回var_dump(pTags)

int2

3 /这里我们可以看到带有var_dump(pTags); object(DOMNodeList)[1001] public 'length' => int 2

的2 <p>个标签
var_dump($pTag);

4 /此处,如果var_dump($pTag); object(DOMElement)[1040] public 'tagName' => string 'p' (length=1) public 'schemaTypeInfo' => null public 'nodeName' => string 'p' (length=1) public 'nodeValue' => string 'TEST1 TEST2. TEST3 ' (length=21) public 'nodeType' => int 1 public 'parentNode' => string '(object value omitted)' (length=22) public 'childNodes' => string '(object value omitted)' (length=22) public 'firstChild' => string '(object value omitted)' (length=22) public 'lastChild' => string '(object value omitted)' (length=22) public 'previousSibling' => null public 'nextSibling' => string '(object value omitted)' (length=22) public 'attributes' => string '(object value omitted)' (length=22) public 'ownerDocument' => string '(object value omitted)' (length=22) public 'namespaceURI' => null public 'prefix' => string '' (length=0) public 'localName' => string 'p' (length=1) public 'baseURI' => null public 'textContent' => string 'TEST1 TEST2. TEST3 ' (length=21) object(DOMElement)[1062] public 'tagName' => string 'p' (length=1) public 'schemaTypeInfo' => null public 'nodeName' => string 'p' (length=1) public 'nodeValue' => string 'TESTTTT' (length=7) public 'nodeType' => int 1 public 'parentNode' => string '(object value omitted)' (length=22) public 'childNodes' => string '(object value omitted)' (length=22) public 'firstChild' => string '(object value omitted)' (length=22) public 'lastChild' => string '(object value omitted)' (length=22) public 'previousSibling' => string '(object value omitted)' (length=22) public 'attributes' => string '(object value omitted)' (length=22) public 'ownerDocument' => string '(object value omitted)' (length=22) public 'namespaceURI' => null public 'prefix' => string '' (length=0) public 'localName' => string 'p' (length=1) public 'baseURI' => null public 'textContent' => string 'TESTTTT' (length=7) 标记没有<p>标记,我会在<show>标记中添加<show>标记。它适用于我的第二个<p>标记,其中initialy没有<p>标记:

<show>

5 /但我在这里遇到了问题。当我在代码末尾执行var_dump($valuePTagFull); string '<show class='st'>TESTTTT</show>' (length=31) 时,他告诉我:

var_dump($value_purified);

为什么最后'TESTTT'这个词不在string '<show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1&nbsp; </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show> TESTTTT' (length=695) 标签之间?而在<show>中,var_dump($valuePTagFull);标签位于......?

2 个答案:

答案 0 :(得分:1)

这是一个操作DOMDocument以获得所需结果的解决方案。有关详细信息,请参阅注释:

class foo
{
    public function reverseTransform($value)
    {
        $dom = new DOMDocument();
        $dom->preserveWhiteSpace = false;
        $dom->formatOutput = true;

        // Load contents wrapped in a temporary root node
        $dom->loadXML('<root>' . $value . '</root>');

        // Use an XPath query to get all P elements
        $xPath = new DOMXPath($dom);
        $pTags = $xPath->query('//p');

        // Loop through the P elements
        $dataStart = 0;
        $dataEnd   = 0;

        foreach ($pTags as $pTag) {
            // Get any DIV elements inside the P
            $divs = $xPath->query('./div', $pTag);

            if ($divs->length > 0) {
                // This P element already has a div. Grab the
                // data-start/end attributes for later
                $div = $divs->item(0);
                $dataStart = $div->getAttribute('data-start');
                $dataEnd   = $div->getAttribute('data-end');
            }
            else {
                // Create a new DIV element and set attributes
                $div = $dom->createElement('div');
                $div->setAttribute('class',      'ST');
                $div->setAttribute('data-start', $dataStart);
                $div->setAttribute('data-end',   $dataEnd);

                // Move all children of P into DIV
                $child = $pTag->firstChild;
                while ($child) {
                    $nextChild = $child->nextSibling;
                    $div->insertBefore($child);
                    $child = $nextChild;
                }

                // Move the DIV inside the P element
                $pTag->appendChild($div);
            }
        }
        // Get HTML, removing temporary root element
        $html = preg_replace(
            '#.*?<root>\s*(.*)\s*</root>#s', '\1',
            $dom->saveXML()
        );
        return $html;
    }
}

$string = <<<EOS
<p>
    Blabla1 Blabla2
</p>
<p>
    <div data-start="1" data-end="5">
        <span>Blabla3 </span><span>Blabla4</span>
    </div>
</p>
<p>
    Blabla5 Blabla6
</p>
<p>
    Blabla7 Blabla8
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla9 </span><span>Blabla10</span>
    </div>
</p>
<p>
    Blabla11 Blabla12
</p>
EOS;

echo (new foo)->reverseTransform($string), PHP_EOL;

输出(为清晰起见缩进):

<p>
    <div class="ST" data-start="0" data-end="0">
        Blabla1 Blabla2
    </div>
</p>
<p>
    <div data-start="1" data-end="5">
        <span>Blabla3 </span>
        <span>Blabla4</span>
    </div>
</p>
<p>
    <div class="ST" data-start="1" data-end="5">
        Blabla5 Blabla6
    </div>
</p>
<p>
    <div class="ST" data-start="1" data-end="5">
        Blabla7 Blabla8
    </div>
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla9 </span>
        <span>Blabla10</span>
    </div>
</p>
<p>
    <div class="ST" data-start="6" data-end="10">
        Blabla11 Blabla12
    </div>
</p>

答案 1 :(得分:0)

如果它是有效的html,您可以使用loadHTML函数并更快地操纵字符串:http://php.net/manual/en/domdocument.loadhtml.php

相关问题