Question

我试图用一个以＆＃39;系统＆＃39;开头的id来刮掉所有的href。来自此网页：http://www.myfxbook.com/systems

这是我的代码，我似乎无法开始工作。我现在正在摆弄几个小时，看着这里无数的回答问题。

    include_once( 'simple_html_dom.php' );  
    $url2process = 'http://www.myfxbook.com/systems';
    $html = file_get_html( $url2process );
    $cnt = 0;
    $parent_mark = $html->find('a[id^=system]');

    $cntr = 0;

    foreach( $parent_mark as $element) {

        if( $cntr > 3 ) continue;
        $cntr++;

        $single_html = file_get_html( $element->href );

UPDATE1：好的，现在有点工作了，但它似乎只是使用了具有正确ID的页面上的最后一个href。我需要使用此ID处理所有这些hrefs，我在这里缺少什么？

Answer 1

你可以使用像这样的domdocument来实现它。

$html = file_get_contents('http://www.myfxbook.com/systems');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
libxml_use_internal_errors(false);
$links = $doc->getElementsByTagName('a');
$cnt = 0;
$cntr = 0;
foreach ($links as $link) {
    if(preg_match('~^system~', $link->getAttribute('id'))) {
        if( $cntr > 3 ) {
            continue;
        }
        $cntr++;
        $single_html = file_get_contents($link->getAttribute('href'));
        if (empty($single_html)) {
            echo 'EMPTY'; 
        }
    }
}

PHP Simple Dom HTML - 无法解析hrefs列表

1 个答案: