使用Php简单的html dom解析器来获取2个帖子

时间:2013-11-12 21:36:07

标签: php simple-html-dom

我正在尝试使用下面的代码从我的博客中获取一些帖子,但这有点困难。

include('simple_html_dom.php');
$html->clear();
getArticles('http://www.example.com');

function getArticles($page) {
    global $articles, $descriptions;

    $html = new simple_html_dom();
    $html->load_file($page);

    $items = $html->find('li');  

    foreach($items as $post) {
        # remember comments count as nodes
        $articles[] = array($post->children(1)->innertext);
    }

我的html结构是:

<li class="post-1840 post type-post status-publish format-standard hentry category-perierga">

<a href="http://www.example.com/?p=1840" title="my post title">
<img width="200" height="179" src="images/iko-200x179.jpg" class="attachment-archive wp-post-image" alt="iko-200x179.jpg" title=""></a>

<h2 class="leading"><a href="http://www.example.com/?p=1840">My Post Title!</a></h2>

<p class="widgetmeta sserif">
6 hours ago  | 
<a href="http://www.example.com/?cat=2" title="View all posts in Weird" rel="category">Weird</a> | 
<a href="http://www.example.com/?author=1" title="Posts by adminadmin" rel="author">admin</a> | 
<span>Comments Off</span>                      
</p>
<p class="teaser">my shord post description...</p>
<a class="mainbutton fr" href="http://www.example.com/?p=1840">Read More »</a>

</li>

我想获取img,即(即帖子标题),其中包含我帖子的简短描述。

再次感谢你们一个人!!!

1 个答案:

答案 0 :(得分:1)

如果我理解得很好,很容易得到所有这些信息......以下是:

// includes Simple HTML DOM Parser
include "simple_html_dom.php";

// The input
$text = '<li class="post-1840 post type-post status-publish format-standard hentry category-perierga">

<a href="http://www.example.com/?p=1840" title="my post title">
<img width="200" height="179" src="images/iko-200x179.jpg" class="attachment-archive wp-post-image" alt="iko-200x179.jpg" title=""></a>

<h2 class="leading"><a href="http://www.example.com/?p=1840">My Post Title!</a></h2>

<p class="widgetmeta sserif">
6 hours ago  | 
<a href="http://www.example.com/?cat=2" title="View all posts in Weird" rel="category">Weird</a> | 
<a href="http://www.example.com/?author=1" title="Posts by adminadmin" rel="author">admin</a> | 
<span>Comments Off</span>                      
</p>
<p class="teaser">my shord post description...</p>
<a class="mainbutton fr" href="http://www.example.com/?p=1840">Read More »</a>

</li>';

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($text);


// Find all li elements
$items = $html->find('li');  

// loop into each li element
foreach($items as $i => $post) {
    // get the img
    $img = $post->find('img', 0)->src;

    // get the post's url       
    $url = $post->find('a', 0)->href;

    // get the title
    $title = $post->find('a', 0)->title;

    // another way to get the title
    $title2 = $post->find('h2.leading', 0)->plaintext;

    // get the description
    $desc = $post->find('p.teaser', 0)->plaintext;

    // Print all
    echo "\n$i => $img | $url | $title | $title2 | $desc";
    echo "<hr/>";
}

// Clear dom object
$html->clear(); 
unset($html);

OUTPUT:
=======
0 => images/iko-200x179.jpg | http://www.example.com/?p=1840 | my post title | My Post Title! | my shord post description...

Live DEMO