HtmlAgilityPack - SelectNodes

时间:2018-01-10 16:27:51

标签: vb.net html-agility-pack

我试图检索<p class>元素。

<div class="thread-plate__details">
    <h3 class="thread-plate__title">(S) HexHunter BOW</h3>
    <p class="thread-plate__summary">created by Aazoth</p>  <!-- (THIS ONE) -->
</div>

但没有运气。

我正在使用的代码如下:

' the example url to scrape
            Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
            Dim source As String = GetSource(url)

            If source IsNot Nothing Then
                ' create a new html document and load the pages source
                Dim htmlDocument As New HtmlDocument
                htmlDocument.LoadHtml(source)

                ' Create a new collection of all href tags
                Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

                ' Using LINQ get all href values that start with http://
                ' of course there are others such as www.
                Dim links =
                    (
                        From node
                        In nodes
                        Let attribute = node.Attributes("class")
                        Where attribute.Value.StartsWith("created by ")
                        Select attribute.Value
                    )

                Me.ListBox1a.Items.AddRange(links.ToArray)
                Dim o, j As Long
                For o = 0 To ListBox1a.Items.Count - 1
                    For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
                        If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                            ListBox1a.Items.Remove(ListBox1a.Items((j)))
                        End If
                    Next
                Next
                For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
                    Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

                Next

                For Each s As String In ListBox1a.Items
                    Dim lvi As New NetSeal.NSListView
                    lvi.Text = s
                    NsListView1.Items.Add(lvi.Text)

                Next

它运行但我无法获得由XXX&#39;创建的&#39;文本。 我尝试了很多方法,但没有运气,一只手会受到赞赏。

先谢谢大家。

1 个答案:

答案 0 :(得分:0)

看起来你在attribute.Value看错了字符串。我看到attribute.Value.StartsWith("created by ")必须更改为此attribute.Value.StartsWith("thread-plate__summary")

要获取节点的内部内容,您必须执行此操作:Select node.InnerText;

' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)

If source IsNot Nothing Then
    ' create a new html document and load the pages source
    Dim htmlDocument As New HtmlDocument
    htmlDocument.LoadHtml(source)

    ' Create a new collection of all href tags
    Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

    ' Using LINQ get all href values that start with http://
    ' of course there are others such as www.
    Dim links =
        (
            From node
            In nodes
            Let attribute = node.Attributes("class")
            Where attribute.Value.StartsWith("thread-plate__summary")
            Select node.InnerText
        )

    Me.ListBox1a.Items.AddRange(links.ToArray)
    Dim o, j As Long
    For o = 0 To ListBox1a.Items.Count - 1
        For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
            If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                ListBox1a.Items.Remove(ListBox1a.Items((j)))
            End If
        Next
    Next
    For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
        Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

    Next

    For Each s As String In ListBox1a.Items
        Dim lvi As New NetSeal.NSListView
        lvi.Text = s
        NsListView1.Items.Add(lvi.Text)

    Next

我希望这对你有用。