在java脚本中解析HTML标记的最佳方法

时间:2010-05-12 12:02:09

标签: php javascript html html-parsing

任何人都可以提供任何帮助/建议解析HTML标记出现在<body>...</body>标记旁边

3 个答案:

答案 0 :(得分:2)

我想你想用PHP解析HTML文档。我建议你阅读 http://www.php.net/manual/en/book.dom.php

以下是PHP Pro

提供的示例
<?php

$html = '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" dir="ltr">
<head>
<title>PHPRO.ORG</title>
</head>
<body>
<h2>Forecast for Saturday</h2>
<!-- Issued at 0828 UTC Friday 23 May 2008 -->
<table border="0" summary="Capital Cities Precis Forecast">
   <tbody>
      <tr>
         <td><a href="/products/IDN10064.shtml" title="Link to Sydney forecast">Sydney</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">19&deg;</td>
         <td>Fine. Mostly sunny.</td>
      </tr>

      <tr>
         <td><a href="/products/IDV10450.shtml" title="Link to Melbourne forecast">Melbourne</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">16&deg;</td>
         <td>Fog then fine.</td>
      </tr>

      <tr>
         <td><a href="/products/IDQ10095.shtml" title="Link to Brisbane forecast">Brisbane</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">24&deg;</td>
         <td>Mostly fine</td>
      </tr>

      <tr>
         <td><a href="/products/IDW12300.shtml" title="Link to Perth forecast">Perth</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">21&deg;</td>
         <td>Few showers, increasing later.</td>
      </tr>

      <tr>
         <td><a href="/products/IDS10034.shtml" title="Link to Adelaide forecast">Adelaide</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">20&deg;</td>
         <td>Fine. Mostly sunny.</td>
      </tr>

      <tr>
         <td><a href="/products/IDT65061.shtml" title="Link to Hobart forecast">Hobart</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">13&deg;</td>
         <td>Mainly fine.</td>
      </tr>

      <tr>
         <td><a href="/products/IDN10035.shtml" title="Link to Canberra forecast">Canberra</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">15&deg;</td>
         <td>Fine, mostly sunny.</td>
      </tr>

      <tr>
         <td><a href="/products/IDD10150.shtml" title="Link to Darwin forecast">Darwin</a></td>
         <td title="Maximum temperature in degrees Celsius" class="max alignright">32&deg;</td>
         <td>Fine and sunny.</td>
      </tr>

   </tbody>
</table>

</body>
</html>
';

    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    $dom->loadHTML($html);

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the table by its tag name ***/
    $tables = $dom->getElementsByTagName('table');

    /*** get all rows from the table ***/
    $rows = $tables->item(0)->getElementsByTagName('tr');

    /*** loop over the table rows ***/
    foreach ($rows as $row)
    {
        /*** get each column by tag name ***/
        $cols = $row->getElementsByTagName('td');
        /*** echo the values ***/
        echo $cols->item(0)->nodeValue.'<br />';
        echo $cols->item(1)->nodeValue.'<br />';
        echo $cols->item(2)->nodeValue;
        echo '<hr />';
    }
?> 

答案 1 :(得分:2)

你的意思是John Resig's html parser吗?

答案 2 :(得分:0)

您可以从另一个页面通过ajax加载整个html文档,并使用jQuery选择器解析它 - 如果它是xhtml。不确定它是否形成良好是否可行。