simple_html_dom:为什么$ html(包含要解析的表的字符串)为空?

时间:2014-08-11 02:37:15

标签: php html dom web-scraping simple-html-dom

我有一个包含HTML表格HTML的字符串。我想从表中提取数据作为维数组。类似的东西:

$Data = Array ( [0]=> Array([0]=>'Name', [1]=>'Age', [2]=>'CGPA'), 
                [1]=> Array([0]=>'Bob', [1]=>'24', [2]=>'3'), 
                [2]=> Array([0]=>'Alice', [1]=>'23', [2]=>'2'), 
                [3]=>Array([0]=>'Amy', [1]=>'22', [2]=>'4') )

我尝试了很多方法,但他们一直给我错误。现在我正在使用" simple_html_dom" ,这似乎很容易被理解。所以我打算用它。

我正在尝试使用code given in the accepted answer of this question。但它给了我 Fatal error: Call to a member function find() on a non-object on line 34

我搜索并找到了这个解决方案,但是当我把支票(在下面给出的代码中注释掉)时,我得到 Parse error: syntax error, unexpected ''$html is empty!'' (T_CONSTANT_ENCAPSED_STRING) on line 35 我不知道为什么呢是空的!可能是一个字符串而不是预期的对象?但我该怎么办呢?

代码: -

<?php

require('simple_html_dom.php');

$html = 'Edit question</a></div></div><div class="content"><div class="formulation"><h4 class="accesshide">Question text</h4><input type="hidden" name="q18:1_:sequencecheck" value="1" /><div class="qtext"><table style="width: 454px; height: 269px;" border="1"><caption> </caption>
<tbody>
<tr>
<td>Name</td>
<td>Age</td>
<td>CGPA</td>
</tr>
<tr>
<td>Alice</td>
<td>24</td>
<td>4</td>
</tr>
<tr>
<td>Bob</td>
<td>14</td>
<td>3</td>
</tr>
<tr>
<td>Amy</td>
<td>33</td>
<td>2</td>
</tr>
</tbody>
</table>
<p> </p>
<p>Blah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah?</p></div><div class="ablock"><div class="prompt">Select one:</div><div class="answer"><div class="r0"><input type="radio" name="q18:1_answer" value="0" id="q18:1_answer0" /><label for="q18:1_answer0">a. [1]ir[/1][2]34[/2]</label> </div>';

//if (!empty($html)) {
    // get the table. Maybe there's just one, in which case just 'table' will do
    $table = $html->find('table');
//} else {die '$html is empty!';}

// initialize empty array to store the data array from each row, that is the array containing the rows (that is entire <tr> tag).
$rowData = array();

// loop over rows
foreach($table->find('tr') as $row) {

    // initialize array to store the cell data from each row, that is the arrays containing data from <td> tags 
    $cellData = array();
    foreach($row->find('td.text') as $cell) {

        // push the cell's text to the array
        $cellData[] = $cell->innertext;
    }

    // push the row's data array to the 'big' array
    $rowData[] = $rowData;
}
print_r($rowData);

1 个答案:

答案 0 :(得分:2)

您可以直接将其指向表格行。例如:

$html_string = 'Edit question</a></div></div><div class="content"><div class="formulation"><h4 class="accesshide">Question text</h4><input type="hidden" name="q18:1_:sequencecheck" value="1" /><div class="qtext"><table style="width: 454px; height: 269px;" border="1"><caption> </caption><tbody><tr><td>Name</td><td>Age</td><td>CGPA</td></tr><tr><td>Alice</td><td>24</td><td>4</td></tr><tr><td>Bob</td><td>14</td><td>3</td></tr><tr><td>Amy</td><td>33</td><td>2</td></tr></tbody></table><p> </p><p>Blah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah?</p></div><div class="ablock"><div class="prompt">Select one:</div><div class="answer"><div class="r0"><input type="radio" name="q18:1_answer" value="0" id="q18:1_answer0" /><label for="q18:1_answer0">a. [1]ir[/1][2]34[/2]</label> </div>';
$html = str_get_html($html_string); // load the string
$rowData = array();
foreach($html->find('table tr') as $row_key => $row) { // load each row
    foreach($row->children() as $td) { // for every td
        $rowData[$row_key][] = $td->innertext; // push the each td in that row
    }
}

echo '<pre>';
print_r($rowData);

应该像这样输出:

Array
(
    [0] => Array
    (
        [0] => Name
        [1] => Age
        [2] => CGPA
    )

    [1] => Array
    (
        [0] => Alice
        [1] => 24
        [2] => 4
    )

    [2] => Array
    (
        [0] => Bob
        [1] => 14
        [2] => 3
    )

    [3] => Array
    (
        [0] => Amy
        [1] => 33
        [2] => 2
    )
)

您的代码说明:

$table = $html->find('table');

由于没有初始化->find对象,因此您无法调出SimpleHTMLDOM。您首先需要str_get_html()file_get_html()