简单的HTML DOM解析不起作用

时间:2016-03-11 23:28:56

标签: php html dom simple-html-dom

我正在尝试从我的html表中提取电子邮件,姓名和电话号码,并使用这些详细信息发送自动电子邮件回复。

出于某种原因,我收到致命错误:Call to undefined function file_get_html() in http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php on line 3

我的html Dom解析器代码:

<?php

$html = file_get_html('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php');

$dom = new DOMDocument();
$dom->loadHTML($html);

$elements = $dom->getElementsByTagName('tr');
//Loop through each row
foreach ($rows as $row) {
    //Loop through each child (cell) of the row
    foreach ($row->children() as $cell) {
        echo $cell->plaintext; // Display the contents of each cell - this is the value you want to extract
    }
}

?>

有人能看到这个错误吗?

我的表格的html代码如下:

<?php

        echo "<table style='border: solid 1px black;'>";
        echo "<tr><th>Id</th><th>First Name</th><th>Last Name</th><th>Email Address</th><th>Phone Num</th><th>Treatment</th><th>Date</th><th>Time</th><th>Message</th><th>Reply</th></tr>";

        class TableRows extends RecursiveIteratorIterator {
            function __construct($it) {
                parent::__construct($it, self::LEAVES_ONLY);
            }

            function current() {
                return "<td style='width:100px;border:1px solid black;'>" . parent::current(). "</td>";
            }

            function beginChildren() {
                echo "<tr>";
            }

            function endChildren() {
                echo "</tr>" . "\n";
            }
        }

        $servername = "#";
        $username = "#";
        $password = "#";
        $dbname = "#";

        try {
            $conn = new PDO("mysql: host=$servername; dbname=$dbname", $username, $password);
            $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
            $stmt = $conn->prepare("SELECT Booking_request_form.id_booking, Client_Information.first_name, Client_Information.last_name, Client_Information.email_address, Client_Information.phone_number, Booking_request_form.treatment, Booking_request_form.date, Booking_request_form.time, Booking_request_form.message FROM Booking_request_form INNER JOIN Client_Information WHERE Client_Information.id_client=Booking_request_form.client_fk"); 

            $stmt->execute();

            // set the resulting array to associative
            $result = $stmt->setFetchMode(PDO::FETCH_ASSOC);
            foreach(new TableRows(new RecursiveArrayIterator($stmt->fetchAll())) as $k=>$v) {
                echo $v;
            }
        }

        catch(PDOException $e) {
            echo "Error: " . $e->getMessage();
        }

        $conn = null;
        echo "</table>";

?> 

对此有一个简单的解决方法吗?

3 个答案:

答案 0 :(得分:1)

使用file_get_contents功能代替file_get_html。 PHP中没有这样的函数file_get_html

但是,HTML中的错误很少:

  1. 未关闭的代码<div class="headertext">。我想它应该在<a href="log_out.php">Logout</a>;
  2. 之后立即关闭
  3. &等实体应编码为&amp;;
  4. 它可能被视为一个错误,但PHP无法识别header标记并发出警告。但是,它仍然可以成功加载HTML页面。
  5. 最后但并非最不重要的是,使用DOMElement属性存在许多错误。
  6. 我已经重写了您的代码,以向您展示它是如何工作的:

    <?php
    
    $html = file_get_contents('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php')
    
    $dom = new DOMDocument();
    $result = $dom->loadHTML($html, LIBXML_NOERROR);
    var_dump($result);
    $elements = $dom->getElementsByTagName('tr');
    //Loop through each row
    var_dump($elements);
    foreach ($elements as $row) {
        //Loop through each child (cell) of the row
        foreach ($row->childNodes as $cell) {
            echo $cell->nodeValue; // Display the contents of each cell - this is the value you want to extract
        }
    }
    
    
    ?>
    

    并且HTML应如下所示:

    <!DOCTYPE html>
    <html>
       <head>
          <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
          <meta http-equiv="X-UA-Compatible" content="chrome=1,IE=edge" />
          <title>Beauty Factory Bookings</title>
          <link href='http://fonts.googleapis.com/css?family=Montserrat:400,700' rel='stylesheet' type='text/css'>
       </head>
       <body>
          <img action="login_success.php" src="http://i.imgur.com/wbhPNAs.png" style="width: 240px; height:35px;"> 
          <header>
             <div class="headertext"> <a href="booking.php">Book Appointment</a> <a href="about.php">About Us</a> <a href="contact.php">Contact Us</a> <a href="log_out.php">Logout</a></div>
          </header>
          <table style='border: solid 1px black;'>
             <tr>
                <th>Id</th>
                <th>First Name</th>
                <th>Last Name</th>
                <th>Email Address</th>
                <th>Phone Num</th>
                <th>Treatment</th>
                <th>Date</th>
                <th>Time</th>
                <th>Message</th>
                <th>Reply</th>
             </tr>
             <tr>
                <td style='width:100px;border:1px solid black;'>1</td>
                <td style='width:100px;border:1px solid black;'>Filip</td>
                <td style='width:100px;border:1px solid black;'>Grebowski</td>
                <td style='width:100px;border:1px solid black;'>grebowskifilip@gmail.com</td>
                <td style='width:100px;border:1px solid black;'>07449474894</td>
                <td style='width:100px;border:1px solid black;'>Waxing - Full Leg &amp; Bikini</td>
                <td style='width:100px;border:1px solid black;'>11/03/2016</td>
                <td style='width:100px;border:1px solid black;'>10:20</td>
                <td style='width:100px;border:1px solid black;'>Is this okay?</td>
             </tr>
             <tr>
                <td style='width:100px;border:1px solid black;'>2</td>
                <td style='width:100px;border:1px solid black;'>Filip</td>
                <td style='width:100px;border:1px solid black;'>Grebowski</td>
                <td style='width:100px;border:1px solid black;'>grebowskifilip@gmail.com</td>
                <td style='width:100px;border:1px solid black;'>07449474894</td>
                <td style='width:100px;border:1px solid black;'>Anti-Age Facial</td>
                <td style='width:100px;border:1px solid black;'>01/01/1970</td>
                <td style='width:100px;border:1px solid black;'>10:20</td>
                <td style='width:100px;border:1px solid black;'>Is this ok????</td>
             </tr>
          </table>
       </body>
       <style> table { margin-top: 60px; border-collapse: collapse; margin-left: auto; margin-right: auto; margin-bottom: 60px; } tr:nth-child(even) { background-color: #f2f2f2 } th, td { padding: 15px; } img { padding-top: 12px; padding-left: 12px; } .headertext { float: right; padding-top: 20px; padding-right: 3%; } body { background: url('#') no-repeat fixed center center; background-size: cover; font-family: 'Montserrat', sans-serif; margin: 0; padding: 0; } header { background: black; -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=50)"; filter: alpha(opacity=80); -moz-opacity: 0.8; -khtml-opacity: 0.8; opacity: 0.7; height: 60px; font-family: 'Montserrat', sans-serif; } a:link { font-size: 15px; margin-left: 75px; color: white; background-color: transparent; text-decoration: none; } a:visited { font-size: 15px; margin-left: 75px; color: white; background-color: transparent; text-decoration: none; } a:hover { font-size: 15px; margin-left: 75px; color: #C0C0C0; background-color: transparent; text-decoration: none; } </style>
    </html>
    

答案 1 :(得分:1)

您将Simple HTML Dom第三方类命令(根据您的问题标题)与DOMDocument内置类命令混合,因此您的代码无法正常工作。

file_get_html() Simple HTML Dom 函数,将其替换为file_get_contents()

$html = file_get_contents( '/Users/sam/Downloads/trash.html' );

$dom = new DOMDocument();
libxml_use_internal_errors( 1 );      // <-- add this line to avoid DOM errors
$dom->loadHTML( $html );

$elements = $dom->getElementsByTagName('tr');

现在,初始化一个数组($rows)来填充单元格值和一个整数字符串($cols)用于列号;您的HTML格式不正确,此变量将帮助您生成格式良好的表格:

$rows = array();
$cols = 0;

在您的代码中还有另一个错误:您将<tr>放入$elements,然后使用foreach()$rows中引用它。然后,您调用->children()方法迭代所有子项,但 DOMElement 没有此方法,请改用->childNodes属性。但是,首先要检查行列号并更新先前声明的变量$cols。在嵌套foreach()内,您将单元格值添加到$rows。您将在稍后显示。要检索 DOMNode 的值,请使用->nodeValue而不是->plaintext。我已经$cell->nodeValue包裹trim()以删除字符串开头/结尾处的额外空格:

foreach ($elements as $key => $row)
{
    if( $row->childNodes->length > $cols ) $cols = $row->childNodes->length;
    foreach( $row->childNodes as $cell )
    {
        $rows[$key][] = trim( $cell->nodeValue );
    }
}

现在,您拥有多维数组$rows中的单元格值。

表格显示

用于显示表格的代码不是您的代码,而是net的复制粘贴代码:它与您的问题无关,您可以忽略它。

使用这样的简单代码:

echo "<table>\n";
echo "    <tr>\n";
for( $j = 0; $j < $cols; $j++ ) echo "        <th>{$rows[0][$j]}</th>\n";
echo "    </tr>\n";
for( $i = 1; $i < count($rows); $i++ )
{
    echo "    <tr>\n";
    for( $j = 0; $j < $cols; $j++ )
    {
        if( isset( $rows[$i][$j] ) ) echo "        <td>{$rows[$i][$j]}</td>\n";
        else                         echo "        <td></td>\n";
    }
    echo "    </tr>\n";
}
echo "</table>\n";

这只是一个工作示例,可以根据需要修改HTML代码。您还可以更改单元格的顺序。请注意打印表头和打印表行之间的不同代码(for()循环从1开始)。另请注意$cols的使用:如果单元格为空,则输出空<td>

答案 2 :(得分:0)

您的HTML应该具有正确的HTML结构,而不仅仅是表格:

<!DOCTYPE html>
<html>
<body>
    <?php
        echo "<table style='border: solid 1px black;'>";
        /* etc */
    ?>
</body>
</html>

另外,请确保正确关闭PHP输出中的标记。

<强> *编辑*

我刚研究了Simple HTML DOM。

确保在您的代码中包含库文件:include("/path/to/simple_html_dom.php");

此外,对于Simple HTML DOM,您无需将$html加载到DOMDocument。简单地说

$html = file_get_html('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php');

$elements = $html->find('tr');

请阅读PHP Simple HTML DOM Parser手册以获取更多信息。