PHP简单HTML DOM解析器 - RSS中的链接元素

时间:2014-07-22 14:08:36

标签: php xml simplexml

我刚开始使用PHP Simple HTML DOM Parser(http://simplehtmldom.sourceforge.net/)并且在解析XML时遇到了一些问题。

我可以完美地解析HTML文档中的所有链接,但解析RSS源(XML格式)中的链接并不起作用。例如,我想解析http://www.bing.com/search?q=ipod&count=50&first=0&format=rss中的所有链接,因此我使用此代码:

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');

foreach($content->find('item') as $entry)
{
$item['title']     = $entry->find('title', 0)->plaintext;
$item['description']    = $entry->find('description', 0)->plaintext;
$item['link'] = $entry->find('link', 0)->plaintext;
$parsed_results_array[] = $item;
}

print_r($parsed_results_array);

脚本解析标题和描述,但链接元素为空。有任何想法吗?我的猜测是"链接"是保留字还是什么,那么如何让解析器工作呢?

3 个答案:

答案 0 :(得分:3)

我建议你使用合适的工具来完成这项工作。使用SimpleXML:另外,它的内置:)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$parsed_results_array = array();
foreach($xml as $entry) {
    foreach($entry->item as $item) {
        // $parsed_results_array[] = json_decode(json_encode($item), true);
        $items['title'] = (string) $item->title;
        $items['description'] = (string) $item->description;
        $items['link'] = (string) $item->link;
        $parsed_results_array[] = $items;
    }
}

echo '<pre>';
print_r($parsed_results_array);

应该产生类似的东西:

Array
(
    [0] => Array
        (
            [title] => Apple - iPod
            [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music.
            [link] => http://www.apple.com/ipod/
        )

    [1] => Array
        (
            [title] => iPod - Wikipedia, the free encyclopedia
            [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ...
            [link] => http://en.wikipedia.org/wiki/IPod
        )

答案 1 :(得分:1)

如果您习惯使用PHP Simple HTML DOM,则可以继续使用它! 太多的方法会产生混淆,而simplehtmldom已经很容易和强大。

请确保你这样开始:

require_once('lib/simple_html_dom.php');

$content =  file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$xml = new simple_html_dom();
$xml->load($content);

然后你可以跟你一起查询!

答案 2 :(得分:0)

编辑simple_html_doom类

protected $self_closing_tags

删除键“链接”

在:

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

在:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);