排除xPath中的某些元素c#

时间:2017-07-27 06:26:39

标签: c# xpath web-scraping html-agility-pack

我的html结构如下:

    <table id = "searchResultsTable">
       <tbody class="searchResultsRowClass">
          <tr>
            <td>....</td>
            <td>....</td>
         </tr>     

         <tr>
            <td>....</td>
            <td>....</td>
        </tr>     
        <!--it repeats 21 times in every page -->

     </tbody>
  </table>

我的C#代码:

private void button1_Click(object sender, EventArgs e)
{

    var url = "url";
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    StreamReader sr = new StreamReader(response.GetResponseStream());
    string sourceCode = sr.ReadToEnd();

    HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
    document.LoadHtml(sourceCode);



    var rows = document.DocumentNode.SelectNodes("//*[@id='searchResultsTable']/tbody/tr");

    foreach (var row in rows)
    {
        if (row.ChildNodes.Count > 0)
        {

            var name = row.SelectSingleNode("td[2]/a[1]").InnerText;
            var year = row.SelectSingleNode("td[3]").InnerText;
            var km = row.SelectSingleNode("td[4]").InnerText;
            var color = row.SelectSingleNode("td[5]").InnerText;
            var price = row.SelectSingleNode("td[6]").InnerText;
            var date = row.SelectSingleNode("td[7]").InnerText;
            var location = row.SelectSingleNode("td[8]").InnerText;

            Console.WriteLine("name" + name + "\nyear" + year + "\nkm" + km + "\ncolor" + color + "\nprice" + price + "\ndate" + date + "\nlocation" + location);
        }
    }

在我的html中,tr[5]为空。因此,我必须将其排除在外。 我尝试了/root/*[not(self::a)] /tr/*[not(self::tr[5])]。但它没有用。

现在,我只能获得前4 tr个元素。然后,

  

发生NullReferenceException。

如何排除xpath中的tr元素之一?

<table id="searchResultsTable" class="">
    <thead>
    <tr>
        <td class="searchResultsFirstColumn">&nbsp;</td>
        <td class="">İlan Başlığı</td>
                    <td>
                            <a href="/otomobil?sorting=a5_asc&amp;price_min=40000&amp;price_max=40000">
    Yıl</a>
</td>
                    <td>
                            <a href="/otomobil?sorting=a4_asc&amp;price_min=40000&amp;price_max=40000">
    Km</a>
</td>
                    <td>Renk</td>
                    <td class="searchResultsPriceHeader">
                            <a href="/otomobil?sorting=price_asc&amp;price_min=40000&amp;price_max=40000">
    Fiyat</a>
</td>
                    <td class="searchResultsDateHeader">
                            <a href="/otomobil?sorting=date_desc&amp;price_min=40000&amp;price_max=40000">
    İlan Tarihi</a>
</td>
                    <td class="searchResultsLastColumn searchResultsLocationHeader">
                            <a href="/otomobil?sorting=address_desc&amp;price_min=40000&amp;price_max=40000">
    İl / İlçe</a>
</td>
                    <td class="searchResultsIgnoredColumn"></td>
    </tr>
    </thead>
    <tbody class="searchResultsRowClass">
    <tr data-id="464336919" class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-fiat-automobilworld-den-2015-linea-pop-89.000km-degisensiz-faturali-464336919/detay">

    <img src="https://image5.sahibinden.com/photos/33/69/19/thmb_464336919waj.jpg" alt="AUTOMOBİLWORLD'DEN 2015 LINEA POP 89.000KM DEĞİŞENSİZ FATURALI #464336919" title="AUTOMOBİLWORLD'DEN 2015 LINEA POP 89.000KM DEĞİŞENSİZ FATURALI">
    </a></td>
    <td class="searchResultsTitleValue ">
                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true">
<div class="action-wrapper" data-classified-id="464336919">
                        <div class="add-to-favorites last favorite">
        <a href="#" class="action classifiedAddFavorite trackClick trackId_favorite hidden">
            Favorilerime Ekle</a>
        <a href="#" class="action classifiedRemoveFavorite trackClick trackId_favorite disable">
            Favorilerimde</a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified">
        <i></i>Karşılaştır</a>
</div>
</div>
                <a class="classifiedTitle" href="/ilan/vasita-otomobil-fiat-automobilworld-den-2015-linea-pop-89.000km-degisensiz-faturali-464336919/detay">
    AUTOMOBİLWORLD'DEN 2015 LINEA POP 89.000KM DEĞİŞENSİZ FATURALI</a>

<a class="titleIcon store-icon" href="https://automobilworld.sahibinden.com/" title="AUTOMOBIL WORLD" style="visibility: visible;">
        <img class="titleIcon" src="https://s0.shbdn.com/assets/images/iconStore:e98c183976843a1e5b3d4e580d614009.png" alt="AUTOMOBIL WORLD" title="AUTOMOBIL WORLD" style="visibility: visible;">
    </a>
<img class="titleIcon" alt="Haritalı İlan" title="Haritalı İlan" src="https://s0.shbdn.com/assets/images/iconHasMap:1f5f8f9b79e391584fe00304345baa05.png" style="visibility: visible;">
<br>

    <div class="classifiedSubtitle " style="visibility: visible;">
        Fiat &gt; Linea &gt; 1.3 Multijet Pop</div>
</td>
            <td class="searchResultsAttributeValue">
                    2015</td>
            <td class="searchResultsAttributeValue">
                    89.000</td>
            <td class="searchResultsAttributeValue">
                    Beyaz</td>
            <td class="searchResultsPriceValue">
                        <div> 40.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>21 Temmuz</span>
                        <br>
                        <span>2017</span>
                    </td>
                <td class="searchResultsLocationValue">
                        İstanbul<br>Büyükçekmece</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable">
        Göster</a>
</td>
</tr>
<tr data-id="460187522" class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-volkswagen-orjinal-kilometre-124800km-lpg-460187522/detay">

    <img src="https://image5.sahibinden.com/photos/18/75/22/thmb_460187522s6x.jpg" alt="ORJİNAL KİLOMETRE 124800KM LPG #460187522" title="ORJİNAL KİLOMETRE 124800KM LPG">
    </a></td>
    <td class="searchResultsTitleValue ">
                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true">
<div class="action-wrapper" data-classified-id="460187522">
                        <div class="add-to-favorites last favorite">
        <a href="#" class="action classifiedAddFavorite trackClick trackId_favorite hidden">
            Favorilerime Ekle</a>
        <a href="#" class="action classifiedRemoveFavorite trackClick trackId_favorite disable">
            Favorilerimde</a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified">
        <i></i>Karşılaştır</a>
</div>
</div>
                <a class="classifiedTitle" href="/ilan/vasita-otomobil-volkswagen-orjinal-kilometre-124800km-lpg-460187522/detay">
    ORJİNAL KİLOMETRE 124800KM LPG</a>

<a class="titleIcon store-icon" href="https://42.sahibinden.com/" title="HÜSEYİN ÖRNEK" style="visibility: visible;">
        <img class="titleIcon" src="https://s0.shbdn.com/assets/images/iconStore:e98c183976843a1e5b3d4e580d614009.png" alt="HÜSEYİN ÖRNEK" title="HÜSEYİN ÖRNEK" style="visibility: visible;">
    </a>
<img class="titleIcon" alt="Haritalı İlan" title="Haritalı İlan" src="https://s0.shbdn.com/assets/images/iconHasMap:1f5f8f9b79e391584fe00304345baa05.png" style="visibility: visible;">
<br>

    <div class="classifiedSubtitle " style="visibility: visible;">
        Volkswagen &gt; Passat &gt; 1.6 Comfortline</div>
</td>
            <td class="searchResultsAttributeValue">
                    2002</td>
            <td class="searchResultsAttributeValue">
                    124.800</td>
            <td class="searchResultsAttributeValue">
                    Beyaz</td>
            <td class="searchResultsPriceValue">
                        <div> 40.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>10 Temmuz</span>
                        <br>
                        <span>2017</span>
                    </td>
                <td class="searchResultsLocationValue">
                        Konya<br>Selçuklu</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable">
        Göster</a>
</td>
</tr>
<tr data-id="397435322" class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-renault-2013-renault-clio-4-touch-paket-boyasizzzzzzzzzzzzzzzzzz-397435322/detay">

    <img src="https://image5.sahibinden.com/photos/43/53/22/thmb_397435322ao8.jpg" alt="2013 RENAULT CLİO 4 TOUCH PAKET ___BOYASIZZZZZZZZZZZZZZZZZZ___ #397435322" title="2013 RENAULT CLİO 4 TOUCH PAKET ___BOYASIZZZZZZZZZZZZZZZZZZ___">
    </a></td>
    <td class="searchResultsTitleValue ">
                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true">
<div class="action-wrapper" data-classified-id="397435322">
                        <div class="add-to-favorites last favorite">
        <a href="#" class="action classifiedAddFavorite trackClick trackId_favorite hidden">
            Favorilerime Ekle</a>
        <a href="#" class="action classifiedRemoveFavorite trackClick trackId_favorite disable">
            Favorilerimde</a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified">
        <i></i>Karşılaştır</a>
</div>
</div>
                <a class="classifiedTitle" href="/ilan/vasita-otomobil-renault-2013-renault-clio-4-touch-paket-boyasizzzzzzzzzzzzzzzzzz-397435322/detay">
    2013 RENAULT CLİO 4 TOUCH PAKET ___BOYASIZZZZZZ­ZZZZZZZZZZZZ___</a>

<a class="titleIcon store-icon" href="https://guvenototarsus.sahibinden.com/" title="GÜVEN OTOMOTİV" style="visibility: visible;">
        <img class="titleIcon" src="https://s0.shbdn.com/assets/images/iconStore:e98c183976843a1e5b3d4e580d614009.png" alt="GÜVEN OTOMOTİV" title="GÜVEN OTOMOTİV" style="visibility: visible;">
    </a>
<img class="titleIcon" alt="Haritalı İlan" title="Haritalı İlan" src="https://s0.shbdn.com/assets/images/iconHasMap:1f5f8f9b79e391584fe00304345baa05.png" style="visibility: visible;">
<br>

    <div class="classifiedSubtitle " style="visibility: visible;">
        Renault &gt; Clio &gt; 1.2 Touch</div>
</td>
            <td class="searchResultsAttributeValue">
                    2013</td>
            <td class="searchResultsAttributeValue">
                    74.000</td>
            <td class="searchResultsAttributeValue">
                    Siyah</td>
            <td class="searchResultsPriceValue">
                        <div> 40.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>08 Temmuz</span>
                        <br>
                        <span>2017</span>
                    </td>
                <td class="searchResultsLocationValue">
                        Mersin<br>Tarsus</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable">
        Göster</a>
</td>
</tr>


<tr data-id="458875511" class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-ford-2011-model-40.binde-otomatik-458875511/detay">

    <img src="https://image5.sahibinden.com/photos/87/55/11/thmb_458875511431.jpg" alt="2011 MODEL 40.binde OTOMATİK #458875511" title="2011 MODEL 40.binde OTOMATİK">
    </a></td>
    <td class="searchResultsTitleValue ">
                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true">
<div class="action-wrapper" data-classified-id="458875511">
                        <div class="add-to-favorites last favorite">
        <a href="#" class="action classifiedAddFavorite trackClick trackId_favorite hidden">
            Favorilerime Ekle</a>
        <a href="#" class="action classifiedRemoveFavorite trackClick trackId_favorite disable">
            Favorilerimde</a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified">
        <i></i>Karşılaştır</a>
</div>
</div>
                <a class="classifiedTitle" href="/ilan/vasita-otomobil-ford-2011-model-40.binde-otomatik-458875511/detay">
    2011 MODEL 40.binde OTOMATİK</a>

<a class="titleIcon store-icon" href="https://mackaotomotiv.sahibinden.com/" title="MAÇKA OTOMOTİV" style="visibility: visible;">
        <img class="titleIcon" src="https://s0.shbdn.com/assets/images/iconStore:e98c183976843a1e5b3d4e580d614009.png" alt="MAÇKA OTOMOTİV" title="MAÇKA OTOMOTİV" style="visibility: visible;">
    </a>
<img class="titleIcon" alt="Haritalı İlan" title="Haritalı İlan" src="https://s0.shbdn.com/assets/images/iconHasMap:1f5f8f9b79e391584fe00304345baa05.png" style="visibility: visible;">
<br>

    <div class="classifiedSubtitle " style="visibility: visible;">
        Ford &gt; Fiesta &gt; 1.4 Titanium</div>
</td>
            <td class="searchResultsAttributeValue">
                    2011</td>
            <td class="searchResultsAttributeValue">
                    40.000</td>
            <td class="searchResultsAttributeValue">
                    Gümüş Gri</td>
            <td class="searchResultsPriceValue">
                        <div> 40.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>07 Temmuz</span>
                        <br>
                        <span>2017</span>
                    </td>
                <td class="searchResultsLocationValue">
                        Düzce<br>Merkez</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable">
        Göster</a>
</td>
</tr>
<tr class="searchResultsPromoToplist">
                <td colspan="12">
                    <div><a href="/doping-tanitim/#doping-5" target="_blank"><strong>Siz de ilanınızın yukarıda yer almasını istiyorsanız <u>tıklayın</u>.</strong></a></div>
                </td>
            </tr>
        <tr data-id="465252780" class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-peugeot-multimedia-sistemli-temiz-2014-active-paket-301-465252780/detay">

    <img class="searchResultThumbnailPlaceholder otherNoImage" src="https://s0.shbdn.com/assets/images/iconHasMegaPhoto:1d086aab554fd92d49d3762a0542888a.png" alt="Multimedia Sistemli Temiz 2014 Active Paket 301 #465252780" title="Megafotolu ilan">
    </a></td>
    <td class="searchResultsTitleValue ">
                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true">
<div class="action-wrapper" data-classified-id="465252780">
                        <div class="add-to-favorites last favorite">
        <a href="#" class="action classifiedAddFavorite trackClick trackId_favorite  hidden">
            Favorilerime Ekle</a>
        <a href="#" class="action classifiedRemoveFavorite trackClick trackId_favorite disable">
            Favorilerimde</a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified">
        <i></i>Karşılaştır</a>
</div>
</div>
                <img class="titleIcon" src="https://s0.shbdn.com/assets/images/iconNew:c9b443de96056beb84b4cdc03ca5051c.png" alt="Yeni İlan" title="Yeni İlan" style="visibility: visible;">
<a class="classifiedTitle" href="/ilan/vasita-otomobil-peugeot-multimedia-sistemli-temiz-2014-active-paket-301-465252780/detay">
    Multimedia Sistemli Temiz 2014 Active Paket 301</a>

<img class="titleIcon" alt="Haritalı İlan" title="Haritalı İlan" src="https://s0.shbdn.com/assets/images/iconHasMap:1f5f8f9b79e391584fe00304345baa05.png" style="visibility: visible;">
<br>

    <div class="classifiedSubtitle " style="visibility: visible;">
        Peugeot &gt; 301 &gt; 1.6 HDi Active</div>
</td>
            <td class="searchResultsAttributeValue">
                    2014</td>
            <td class="searchResultsAttributeValue">
                    95.500</td>
            <td class="searchResultsAttributeValue">
                    Beyaz</td>
            <td class="searchResultsPriceValue">
                        <div> 40.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>25 Temmuz</span>
                        <br>
                        <span>2017</span>
                    </td>
                <td class="searchResultsLocationValue">
                        İstanbul<br>Kadıköy</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable">
        Göster</a>
</td>
</tr>

以下是html的一部分。我试图排除<tr class="searchResultsPromoToplist">

2 个答案:

答案 0 :(得分:1)

如果您只想匹配具有<tr>子元素的<td>元素,请从XPath更改{/ 1>}

//*[@id='searchResultsTable']/tbody/tr

为:

//*[@id='searchResultsTable']/tbody/tr[td]

修改

根据您发布的HTML,您感兴趣的<tr>元素似乎都具有data-id属性。如果是这种情况,请将XPath更改为:

//*[@id='searchResultsTable']/tbody/tr[@data-id]

答案 1 :(得分:0)

尝试

string.isnullorempty(yourrowelement)? yourrowelement:string.empty