Php Simple Html Dom Parser无法获得分页内容

时间:2014-03-26 18:01:52

标签: php simple-html-dom

您好我是使用simple_html_dom的初学者。我正在尝试使用以下代码从此示例网站的帖子列表中获取href的列表。

<?php
include('simple_html_dom.php');

$html = file_get_html('http://www.themelock.com/wordpress/elegantthemes/');

function getArticles($page) {

    global $articles;

    $html = new simple_html_dom();
    $html->load_file($page);

    $items = $html->find('h2[class=post-title]');  

    foreach($items as $post) {
        $articles[] = array($post->children(0)->href);
    }

    foreach($articles as $item) {
            echo "<div class='item'>";
            echo $item[0];
            echo "</div>";
        }
}

if($next = $html->find('div[class=navigation]', 0)->last_child() ) {
    $URL = $next->href;

    $html->clear();
    unset($html);

    getArticles($URL);
}

?>

结果我得到了

http://www.themelock.com/wordpress/908-minimal-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/892-event-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/882-askit-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/853-lightbright-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/850-inreview-elegantthemes-review-wordpress-theme.html
http://www.themelock.com/wordpress/807-boutique-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/804-elist-elegantthemes-directory-wordpress-theme.html
http://www.themelock.com/wordpress/798-webly-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/795-elegantestate-real-estate-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/786-notebook-elegantthemes-wordpress-theme.html

以上代码仅提取下一页(第二页)内容。我想知道如何获得第一页的网址后跟下一页。

有人知道怎么做吗?

1 个答案:

答案 0 :(得分:1)

感谢您的支持人员,我使用以下代码使其工作

<?php
include('simple_html_dom.php');

$url = "http://www.themelock.com/wordpress/yootheme-wordpress/";

// Start from the main page
$nextLink = $url;

// Loop on each next Link as long as it exsists
while ($nextLink) {
    echo "<hr>nextLink: $nextLink<br>";
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a url
    $html->load_file($nextLink);

    $posts = $html->find('h2[class=post-title]');

    foreach($posts as $post) {
        // Get the link
        $articles = $post->children(0)->href;        
        echo $articles.'</br>';
    }

    // Extract the next link, if not found return NULL
    $nextLink = ( ($temp = $html->find('div[class=navigation]', 0)->last_child()) ? $temp->href : NULL );

    // Clear DOM object
    $html->clear();
    unset($html);
}

?>