PHP简单的HTML DOM解析器,在没有类或id的标签内查找文本

时间:2013-06-18 17:48:58

标签: php html dom html-parsing

我有http://www.statistics.com/index.php?page=glossary&term_id=703

具体在这些部分:

<b>Additive Error:</b>
<p> Additive error is the error that is added to the true value and does not 
depend on the true value itself. In other words, the result of the measurement is 
considered as a sum of the true value and the additive error:   </p> 

我尽力在标记<p></p>之间获取文字,其中包含:

include('simple_html_dom.php');
$url = 'http://www.statistics.com/index.php?page=glossary&term_id=703';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);

foreach ( $html->find('b') as $e ) {
echo $e->innertext . '<br>';
}

它给了我:

Additive Error:
Browse Other Glossary Entries

我尝试将foreach更改为:foreach ( $html->find('b p') as $e ) {

然后foreach ( $html->find('/b p') as $e ) {

然后它只是给我一个空白页面。 我做错了什么? 感谢。

3 个答案:

答案 0 :(得分:1)

为什么不使用PHP的内置DOM扩展和xpath?

libxml_use_internal_errors(true);  // <- you might needs this if that page has errors
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');
//                             ^
//  this will get you text content from <p> tags preceded by <b> tags

如果<p>之前有多个<b>代码,并且您想获得第一个,请将xpath查询调整为:

string((//p[preceding::b]/text())[1])

要将它们全部作为DOMNodeList,请忽略string()函数://p[preceding::b]/text(),然后您可以遍历列表并访问每个节点的textContent属性。 ..

答案 1 :(得分:0)

如果您想要b或p标签内的所有内容,您只需执行foreach ($html->find('b,p') as $e) { ... }

答案 2 :(得分:0)

试试这个

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.statistics.com/index.php?page=glossary&term_id=703');
$xpath = new DOMXPath($dom);

$mytext = '';
foreach($xpath->query('//font') as $font){
    $mytext =  $xpath->query('.//p', $font)->item(0)->nodeValue;
    break;
}

echo $mytext;
?>
相关问题