Htmlagilitypack Loop Through Table - 嵌套在里面

时间:2017-07-24 20:55:57

标签: c# selenium-webdriver foreach web-scraping html-agility-pack

我有一个奇怪的表,我想循环来抓取数据

表格html是:

<table class="footable footable-loaded default breakpoint">
<thead>
    <tr>
        <th data-class="expand" data-type="numeric" class="footable-sortable">
            Container

            <span class="footable-sort-indicator"></span>
        </th>
        <th data-sort-initial="true" class="footable-sortable">
            Cnt Type

            <span class="footable-sort-indicator"></span>
        </th>
        <th data-hide="all" class="footable-sortable" style="display: none;">
            <span class="footable-sort-indicator"></span>
        </th>
    </tr>
</thead>
<tbody id="ContentPlaceHolder1_con_mov">
    <tr class="koyu footable-detail-show">
        <td data-value="0" class="expand">CAIU2181527</td>
        <td>20'DC</td>
        <td style="display: none;">
            <div style="display:inline-block;width:100%" id="mHeader">
                <div style="float:left;width:33%" class="mov_div">LOCATION</div>
                <div style="float:left;width:33%" class="mov_div">DATE</div>
                <div style="float:left;width:33%" class="mov_div">MOVEMENT</div>
            </div>
            <div style="display:inline-block;width:100%">
                <div style="float:left;width:33%" class="mov_div">KAAN KALKAVAN, IE1729W</div>
                <div style="float:left;width:33%" class="mov_div">07.20.2017</div>
                <div style="float:left;width:33%" class="mov_div">LOADED TO VESSEL </div>
                <div>
                    <div style="display:inline-block;width:100%">
                        <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                        <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                        <div style="float:left;width:33%" class="mov_div">GATE IN FULL </div>
                        <div>
                            <div style="display:inline-block;width:100%">
                                <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                                <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                                <div style="float:left;width:33%" class="mov_div">DISPATCHED EMPTY TO SHIPPER </div>
                                <div>
                                    <div style="display:inline-block;width:100%">
                                        <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                                        <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                                        <div style="float:left;width:33%" class="mov_div">BOOKED </div>
                                        <div></div>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </td>
    </tr>
    <tr class="footable-row-detail">
        <td class="footable-cell-detail" colspan="2">
            <div class="footable-row-detail-inner">
                <div>
                    <strong></strong>
                    <div style="display:inline-block;width:100%" id="mHeader">
                        <div style="float:left;width:33%" class="mov_div">LOCATION</div>
                        <div style="float:left;width:33%" class="mov_div">DATE</div>
                        <div style="float:left;width:33%" class="mov_div">MOVEMENT</div>
                    </div>
                    <div style="display:inline-block;width:100%">
                        <div style="float:left;width:33%" class="mov_div">KAAN KALKAVAN, IE1729W</div>
                        <div style="float:left;width:33%" class="mov_div">07.20.2017</div>
                        <div style="float:left;width:33%" class="mov_div">LOADED TO VESSEL </div>
                        <div>
                            <div style="display:inline-block;width:100%">
                                <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                                <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                                <div style="float:left;width:33%" class="mov_div">GATE IN FULL </div>
                                <div>
                                    <div style="display:inline-block;width:100%">
                                        <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                                        <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                                        <div style="float:left;width:33%" class="mov_div">DISPATCHED EMPTY TO SHIPPER </div>
                                        <div>
                                            <div style="display:inline-block;width:100%">
                                                <div style="float:left;width:33%" class="mov_div">TR, IZMIR</div>
                                                <div style="float:left;width:33%" class="mov_div">07.17.2017</div>
                                                <div style="float:left;width:33%" class="mov_div">BOOKED </div>
                                                <div></div>
                                            </div>
                                        </div>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </td>
    </tr>
</tbody>
<tfoot id="ContentPlaceHolder1_con_footer"></tfoot>

我通常使用:

循环遍历常规表
IWebElement element1 = driver.FindElement(By.XPath("something"));
                String contents = (String)((IJavaScriptExecutor)driver).ExecuteScript("return arguments[0].outerHTML;", element1);
                var node = HtmlNode.CreateNode(contents);
                foreach (var eachNode in node.SelectNodes("//something/tr"))
                {
                    var cells = eachNode.SelectNodes(".//td");
                    cd = new TableDetail();

                    for (int i = 0; i < cells.Count(); i++)
                    {
                     Getting data from table
                    }
                }

任何想法如何在上表中循环?因为它嵌套在传统方式内是行不通的。

1 个答案:

答案 0 :(得分:0)

我设法通过使用HAP Css Selector;

来实现
 IWebElement element1 = driver.FindElement(By.XPath("//*[@id=\"ContentPlaceHolder1_con_mov\"]/tr[1]"));
                String contents = (String)((IJavaScriptExecutor)driver).ExecuteScript("return arguments[0].outerHTML;", element1);
                var node = HtmlNode.CreateNode(contents);
                foreach (var eachNode in node.QuerySelectorAll("div[style=display:inline-block;width:100%]"))
                {
                    var cells = eachNode.SelectNodes("div[@class=\"mov_div\"]");