简单的DOM html解析器读取html表

时间:2014-02-18 22:21:23

标签: php html parsing dom html-table

我试图通过php dom解析器读取此HTML表的特定值。我希望我的代码只读取“td width”标签并仅从表中输出这些项目,如下所示:

“WAITLIST,91630,ACCY 2001,10,Intro Financial Accounting,3.00,Zou,Y,Duques 251,9:35 AM-10:50 AM,01/13 / 14-04 / 28/14”

这是HTML表格:

<table width="100%"  border="0" cellspacing="1" cellpadding="0" bgcolor="#006699">
                                <tr align="center" class="tableRow1Font">
                                    <td width="7%">WAITLIST</td>
                                    <td width="5%">91630</td>
                                    <td width="11%">
                ACCY <A HREF="http://www.gwu.edu/~bulletin/ugrad/accy.html#2001" target="_blank">2001</A>
                                    </td>
                                    <td width="5%">10</td>
                                    <td width="16%">Intro Financial Accounting</td>
                                    <td width="6%">3.00</td>
                                    <td width="8%"> Zou, Y</td>
                                    <td width="8%"><A HREF="http://www.gwu.edu/~map/building.cfm?BLDG=DUQUES" target="_blank" >DUQUES</a> 251</td>
                                    <td width="13%">TR<br>09:35AM - 10:50AM</td>
                                    <td width="14%">
                                        01/13/14 - 04/28/14
                                    </td>
                                    <td width="7%">

                                    </td>
                                </tr>
                                                     </table

这是我的PHP代码,它抓取整个表,我的输出中不需要的一些元素,并多次重复输出:

 // Retrieve the DOM from a given URL
$html = file_get_html('testdata.html');

foreach($html->find('table') as $e){
foreach($html->find('td') as $f){
    echo $f->innertext . '<br>';
    }
    }

如何将代码更改为仅抓取并输出这些元素: “WAITLIST,91630,ACCY 2001,10,Intro Financial Accounting,3.00,Zou,Y,Duques 251,9:35 AM-10:50 AM,01/13 / 14-04 / 28/14”

1 个答案:

答案 0 :(得分:1)

// Retrieve the DOM from a given URL
$html = file_get_html('testdata.html');

foreach($html->find('table') as $e){
    foreach($e->find('td') as $f){
        echo strip_tags($f->innertext) . '<br>';
    }
}

你已经非常接近......

忘了标签。看看strip_tags是否适合您。

http://us3.php.net/strip_tags