使用PHP DOMDocument解析html

时间:2013-02-11 05:56:13

标签: php domdocument

我试图拉出一个远程html的特定部分...这是代码......

<div id="content">
     <div class="main-wide">
        <ul id="nav-sub">
            <li id="sub-list"><a href="/events/" class="on">List View</a></li>
            <li id="sub-cal"><a href="/events/calendar/">Calendar View</a></li>
        </ul>
        <h2 id="ev-201302">February 2013 <a href="/events/calendar/02/2013" title="Events Calendar for February 2013" class="cal">(Calendar View)</a></h2>
        <ul class="lst lst-lg">
            <li>
                <h3><a href="http://site.com/link_1>link text one</a></h3>
                <ul class="meta">
                    <li>February 1st - February 2nd, 2013</li>
                </ul>
            </li>
            <li>
                <h3><a href="http://site.com/link_2>link text two</a></h3>
                <ul class="meta">
                    <li>February 1st - February 28th, 2013</li>
                </ul>
            </li>
            <li>
            <li>
                <h3><a href="http://site.com/link_3>link text three</a></h3>
                <ul class="meta">
                    <li>February 1st - February 15th, 2013</li>
                </ul>
            </li>
        </ul>
     </div>
 </div>

我想要抓住<ul class='lst lst-lg'>之间的所有内容并使其成为可以回应我想要的内容,以便它看起来像下面这样....

<tr>
    <td align='left'>February 1st - February 2nd, 2013</td>
    <td align='left'><a href='http://site.com/link_1'>Link text one</a></td>
</tr>
<tr>
    <td align='left'>February 1st - February 28th, 2013</td>
    <td align='left'><a href='http://site.com/link_2'>Link text two</a></td>
</tr>
<tr>
    <td align='left'>February 1st - February 15th, 2013</td>
    <td align='left'><a href='http://site.com/link_3'>Link text three</a></td>
</tr>

等等...到目前为止我有这个...

function get_data($url) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

    $data = curl_exec($ch);
    curl_close($ch);
return $data;
}

$page = get_data('http://site.com/index.php');
$doc = new DOMDocument();
$dom->preserveWhiteSpace = false;
$doc->loadHTML($page);
$uls = $doc->getElementsByTagName('ul');
$i = 0;
while($table = $uls->item($i++)){
     $class_node = $table->attributes->getNamedItem('class');
     $li_node = $table->nodeName;
     if($class_node){
 echo $table->nodeName . " - " . $table->nodeValue . "<br>";
}
}

到目前为止,我一直试图回应这些值,因此while循环中的内容更多地是在播放和尝试学习。我已经能够从页面获取信息,但格式化并获得正确的信息是此时的问题。

非常感谢!

0 个答案:

没有答案
相关问题