从外部表刮刮

时间:2015-07-23 09:32:26

标签: php web-scraping simple-html-dom

我有一个外部供应商,他向我提供了一个Iframe数据,他的iframe没有响应,我的网站也有响应,因此在我的网站中显示iframe时出现了不匹配。

所以我看着简单的html dom解析器从远程URL获取内容,然后在boostrapped表中显示内容。

iframe中的表格如下所示:

<table cellspacing="0" cellpadding="0" width="100%">
        <tr>
            <td>
                <table cellpadding="0" cellspacing="0" width="100%">
                    <tr>
                        <td id="ctl00_ContentPlaceHolder1_InnerTable"><table CellSpacing=0 cellpadding=0 Width=100% ><tr class='GridHeaderclass'><td class=GridHeadLeft>Particulars  </td><td class=GridHeadCenter>Dec&nbsp;2014</td><td class=GridHeadCenter>Dec&nbsp;2013</td><td class=GridHeadCenter>Dec&nbsp;2012</td><td class=GridHeadCenter>Dec&nbsp;2011</td><td class=GridHeadCenter>Dec&nbsp;2010</td></tr><tr class='GridRow_Default'id="R3" ><td class=GridDataLeft>Operational & Financial Ratios</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R5" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;CEPS(Rs)</td><td class=GridDataRight>16.11&nbsp;&nbsp;</td><td class=GridDataRight_Alt>13.22&nbsp;&nbsp;</td><td class=GridDataRight>10.92&nbsp;&nbsp;</td><td class=GridDataRight_Alt>12.46&nbsp;&nbsp;</td><td class=GridDataRight>5.42&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R7" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Book NAV/Share(Rs)</td><td class=GridDataRight>132.70&nbsp;&nbsp;</td><td class=GridDataRight_Alt>126.36&nbsp;&nbsp;</td><td class=GridDataRight>122.61&nbsp;&nbsp;</td><td class=GridDataRight_Alt>119.61&nbsp;&nbsp;</td><td class=GridDataRight>114.38&nbsp;&nbsp;</td></tr></tr><tr class='GridRow_Default'id="R9" ><td class=GridDataLeft>Margin Ratios</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R11" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;EBIT Margin(%)</td><td class=GridDataRight>5.85&nbsp;&nbsp;</td><td class=GridDataRight_Alt>4.74&nbsp;&nbsp;</td><td class=GridDataRight>3.28&nbsp;&nbsp;</td><td class=GridDataRight_Alt>4.22&nbsp;&nbsp;</td><td class=GridDataRight>2.05&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R13" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;PAT Margin (%)</td><td class=GridDataRight>2.80&nbsp;&nbsp;</td><td class=GridDataRight_Alt>2.16&nbsp;&nbsp;</td><td class=GridDataRight>1.71&nbsp;&nbsp;</td><td class=GridDataRight_Alt>2.48&nbsp;&nbsp;</td><td class=GridDataRight>0.96&nbsp;&nbsp;</td></tr></tr><tr class='GridRow_Default'id="R15" ><td class=GridDataLeft>Performance Ratios</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R17" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;ROE(%)</td><td class=GridDataRight>8.33&nbsp;&nbsp;</td><td class=GridDataRight_Alt>6.71&nbsp;&nbsp;</td><td class=GridDataRight>5.35&nbsp;&nbsp;</td><td class=GridDataRight_Alt>7.44&nbsp;&nbsp;</td><td class=GridDataRight>2.62&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R19" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Asset Turnover(x)</td><td class=GridDataRight>0.78&nbsp;&nbsp;</td><td class=GridDataRight_Alt>0.88&nbsp;&nbsp;</td><td class=GridDataRight>1.12&nbsp;&nbsp;</td><td class=GridDataRight_Alt>1.24&nbsp;&nbsp;</td><td class=GridDataRight>1.17&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R21" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Working Capital/Sales(x)</td><td class=GridDataRight>8.31&nbsp;&nbsp;</td><td class=GridDataRight_Alt>9.56&nbsp;&nbsp;</td><td class=GridDataRight>8.21&nbsp;&nbsp;</td><td class=GridDataRight_Alt>7.06&nbsp;&nbsp;</td><td class=GridDataRight>4.19&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R23" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Fixed Capital/Sales(x)</td><td class=GridDataRight>0.24&nbsp;&nbsp;</td><td class=GridDataRight_Alt>0.21&nbsp;&nbsp;</td><td class=GridDataRight>0.19&nbsp;&nbsp;</td><td class=GridDataRight_Alt>0.17&nbsp;&nbsp;</td><td class=GridDataRight>0.14&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R25" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Inventory Days</td><td class=GridDataRight>42.12&nbsp;&nbsp;</td><td class=GridDataRight_Alt>42.51&nbsp;&nbsp;</td><td class=GridDataRight>41.97&nbsp;&nbsp;</td><td class=GridDataRight_Alt>39.77&nbsp;&nbsp;</td><td class=GridDataRight>39.41&nbsp;&nbsp;</td></tr></tr><tr class='GridRow_Default'id="R27" ><td class=GridDataLeft>Valuation Parameters</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R29" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;PCE(x)</td><td class=GridDataRight>79.85&nbsp;&nbsp;</td><td class=GridDataRight_Alt>52.39&nbsp;&nbsp;</td><td class=GridDataRight>64.11&nbsp;&nbsp;</td><td class=GridDataRight_Alt>46.83&nbsp;&nbsp;</td><td class=GridDataRight>146.12&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R31" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Yield(%)</td><td class=GridDataRight>0.29&nbsp;&nbsp;</td><td class=GridDataRight_Alt>0.43&nbsp;&nbsp;</td><td class=GridDataRight>0.43&nbsp;&nbsp;</td><td class=GridDataRight_Alt>0.51&nbsp;&nbsp;</td><td class=GridDataRight>0.25&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R33" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;EV/Core EBITDA(x)</td><td class=GridDataRight>46.44&nbsp;&nbsp;</td><td class=GridDataRight_Alt>30.47&nbsp;&nbsp;</td><td class=GridDataRight>42.25&nbsp;&nbsp;</td><td class=GridDataRight_Alt>30.76&nbsp;&nbsp;</td><td class=GridDataRight>86.70&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R35" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;EV/CE(x)</td><td class=GridDataRight>2.64&nbsp;&nbsp;</td><td class=GridDataRight_Alt>1.42&nbsp;&nbsp;</td><td class=GridDataRight>1.86&nbsp;&nbsp;</td><td class=GridDataRight_Alt>1.93&nbsp;&nbsp;</td><td class=GridDataRight>2.81&nbsp;&nbsp;</td></tr></tr><tr class='GridRow_Default'id="R37" ><td class=GridDataLeft>Growth Ratio</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R39" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Core EBITDA Growth(%)</td><td class=GridDataRight>19.97&nbsp;&nbsp;</td><td class=GridDataRight_Alt>37.67&nbsp;&nbsp;</td><td class=GridDataRight>-9.25&nbsp;&nbsp;</td><td class=GridDataRight_Alt>110.66&nbsp;&nbsp;</td><td class=GridDataRight>-69.87&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R41" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;PAT Growth(%)</td><td class=GridDataRight>29.18&nbsp;&nbsp;</td><td class=GridDataRight_Alt>28.73&nbsp;&nbsp;</td><td class=GridDataRight>-25.54&nbsp;&nbsp;</td><td class=GridDataRight_Alt>191.85&nbsp;&nbsp;</td><td class=GridDataRight>-82.17&nbsp;&nbsp;</td></tr></tr><tr class='GridRow_Default'id="R43" ><td class=GridDataLeft>Financial Stability Ratios</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td><td class=GridDataRight_Alt>&nbsp;&nbsp;</td><td class=GridDataRight>&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R45" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Current Ratio(x)</td><td class=GridDataRight>1.17&nbsp;&nbsp;</td><td class=GridDataRight_Alt>1.14&nbsp;&nbsp;</td><td class=GridDataRight>1.25&nbsp;&nbsp;</td><td class=GridDataRight_Alt>1.28&nbsp;&nbsp;</td><td class=GridDataRight>1.47&nbsp;&nbsp;</td></tr></tr><tr class='GridAltRow_Default'id="R47" ><td class=GridDataLeft>&nbsp;&nbsp;&nbsp;Interest Cover(x)</td><td class=GridDataRight>3.91&nbsp;&nbsp;</td><td class=GridDataRight_Alt>3.35&nbsp;&nbsp;</td><td class=GridDataRight>4.62&nbsp;&nbsp;</td><td class=GridDataRight_Alt>6.77&nbsp;&nbsp;</td><td class=GridDataRight>3.86&nbsp;&nbsp;</td></tr></tr></table></td>

                    </tr>
                    <tr>
                        
                    </tr>
                </table>
            </td>
        </tr>
    </table>

现在任何人都可以指导我使用简单的html dom解析器,以便我可以在我自己的boostrap表中显示上表中的值。

我尝试过的代码是返回值,但它们会一次又一次地重复,并且不会将自己格式化为表格。

<?php
require('simple_html_dom.php');

$table = array();

$html = file_get_html('www.iframeprovidorURL.com');
foreach($html->find('table') as $e){
    foreach($e->find('td') as $f){
        echo strip_tags($f->innertext) . '<br>';
    }
}

?>

我也试过了:

<?php

$html_string = file_get_contents('http://iframeURL.com');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html_string);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$values = array();
$row = $xpath->query('//td[@id="ctl00_ContentPlaceHolder1_InnerTable"]');
foreach($row as $value) {
    $values[] = trim($value->textContent);
}

echo '<pre>';
print_r($values);
?>
以上结果为:

Array
(
    [0] => Particulars  Dec 2014Dec 2013Dec 2012Dec 2011Dec 2010Operational & Financial Ratios             CEPS(Rs)16.11  13.22  10.92  12.46  5.42     Book NAV/Share(Rs)132.70  126.36  122.61  119.61  114.38  Margin Ratios             EBIT Margin(%)5.85  4.74  3.28  4.22  2.05     PAT Margin (%)2.80  2.16  1.71  2.48  0.96  Performance Ratios             ROE(%)8.33  6.71  5.35  7.44  2.62     Asset Turnover(x)0.78  0.88  1.12  1.24  1.17     Working Capital/Sales(x)8.31  9.56  8.21  7.06  4.19     Fixed Capital/Sales(x)0.24  0.21  0.19  0.17  0.14     Inventory Days42.12  42.51  41.97  39.77  39.41  Valuation Parameters             PCE(x)79.85  52.39  64.11  46.83  146.12     Yield(%)0.29  0.43  0.43  0.51  0.25     EV/Core EBITDA(x)46.44  30.47  42.25  30.76  86.70     EV/CE(x)2.64  1.42  1.86  1.93  2.81  Growth Ratio             Core EBITDA Growth(%)19.97  37.67  -9.25  110.66  -69.87     PAT Growth(%)29.18  28.73  -25.54  191.85  -82.17  Financial Stability Ratios             Current Ratio(x)1.17  1.14  1.25  1.28  1.47     Interest Cover(x)3.91  3.35  4.62  6.77  3.86  
)

上面的内容比前一个要好得多,但是这个输出如何在boostrap表中整理好?我不知道输出来自哪里。

还想知道DOM解析会比显示iframe更快吗?因为目前我网站的iframe页面非常慢。

问候

0 个答案:

没有答案
相关问题