symfony crawler访问嵌套div

时间:2016-09-06 15:14:35

标签: symfony phpunit domcrawler

我拼命试图访问嵌套div中的内容:

<tr>
<th class="monthCellContent" style="vertical-align : top">
    <div class="monthEventWrapper">
        <div class="monthEvent">
            <a class="event"
                href="/event/1"
                title="test title updated - test place - 09:00-10:00">
                    09:00
                    <span class="showForMediumInline">
                        test title updated test place
                    </span>
            </a>
        </div>
    </div>
</th>
</tr>

我试图访问&#34; 09:00&#34;和&#34;测试标题更新的测试地点&#34;在链接中。

我不知何故被困在

<div class="monthEventWrapper">

我可以使用

访问
$items = $crawler->filter('div[class="monthEventWrapper"]');
print "\n found " . count($items) . " monthEventWrapper divs\n";

found 35 monthEventWrapper divs

但我无法访问

<div class="monthEvent">

$items = $crawler->filter('div[class="monthEvent"]');
print "\n found " . count($items) . " monthEvent divs\n";

found 0 monthEvent divs

我尝试了所有变体

foreach ($items as $item) {
    foreach ($item->childNodes as $child) {
        $value .= $paragraph->ownerDocument->saveHTML($child);
    }
}

$crawler->filterXPath('//div[@class="monthEvent"]')

没有运气。

html传递验证并且没有js。

谢谢!

1 个答案:

答案 0 :(得分:0)

这是一种变通方法:

<?php

use Symfony\Component\DomCrawler\Crawler;
require_once(__DIR__ . '/../vendor/autoload.php');

$html = <<<'HTML'
<!DOCTYPE html>

<html>
    <body>
        <tr>
            <th class="monthCellContent" style="vertical-align : top">
                <div class="monthEventWrapper">
                    <div class="monthEvent">
                        <a class="event"
                            href="/event/1"
                            title="test title updated - test place - 09:00-10:00">
                                09:00
                                <span class="showForMediumInline">
                                    test title updated test place
                                </span>
                        </a>
                    </div>
                </div>
            </th>
        </tr>
    </body>
</html>

HTML;

$crawler = new Crawler($html);
$crawlerFiltered = $crawler->filter('div[class="monthEventWrapper"] a');

$results = [];
$childResults = [];
for ($i=0; $i<count($crawlerFiltered); $i++) {
    $results[] = removeLeadingAndTrailingWhiteCharsAndNewLine($crawlerFiltered->eq($i)->text());

    $children = $crawlerFiltered->eq($i)->children();
    if (count($children)) {
        for ($j=0; $j<count($children); $j++) {
            $childResults[] = removeLeadingAndTrailingWhiteCharsAndNewLine($children->eq($j)->text());
        }
    }
}

$results[0] = substractSpan($results[0], $childResults[0]);

function removeLeadingAndTrailingWhiteCharsAndNewLine(string $text) : string
{
    $pattern = '/(?:\r\n[\s]+|\n[\s]+)/s';
    return preg_replace($pattern, '', $text);
}

function substractSpan($text, $textToSubstract) : string
{
    $length = strlen($text) - strlen($textToSubstract);
    return substr($text, 0, $length);
}

echo 'Parent Nodes:' . PHP_EOL;
var_export($results);
echo PHP_EOL;
echo 'Child Nodes:' . PHP_EOL;
var_export($childResults);

echo PHP_EOL;
echo 'Time: '; 
echo $results[0];

echo PHP_EOL;
echo 'Text: ';
echo $childResults[0];

但给出以下结果:

Parent Nodes:
array (
  0 => '09:00',
)
Child Nodes:
array (
  0 => 'test title updated test place',
)
Time: 09:00
Text: test title updated test placee

请注意,我将for loop->eq(<node-number>)一起使用,它提供了Crawler实例,而不是您通过使用DOMNode获得的foreach

请注意,代码假定所需的文本部分9:00位于开头。