从HTML字符串中获取网址

时间:2014-02-23 01:24:49

标签: html regex vb.net url

我有以下代码来抓取div元素:

For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
        If ele.GetAttribute("className").Contains("description") Then
            Dim content As String = ele.InnerHtml
            If content.Contains("http://myserver.com/image/check.png") Then
    'Do stuff if image exists
            Else
    'Do stuff if image doesn't exist
            End If
        End If

div元素如下所示:

<DIV class=headline><SPAN class=blue-title-lg>TITLE_HERE
</SPAN>&nbsp;&nbsp;&nbsp;&nbsp;LOCATION1_HERE,&nbsp;LOCATION2_HERE</DIV>DESCRIPTION_HERE<BR>
<DIV class=about><A class=link href="viewprofile.aspx?
profile_id=00000000">USERNAME</A>&nbsp;20&nbsp;&nbsp;&nbsp;&nbsp;FSM - 
Friends&nbsp;&nbsp;&nbsp;<FONT color=green>Online Today</FONT></DIV>

当勾号图像不存在时,我想抓住其中的网址:

<a class=link href="viewprofile.aspx?profile_id=00000000"></a>

并将其放入字符串中。这是我打砖墙的地方,我需要一些帮助。我认为正则表达式解决方案可以解决我的问题,但正则表达式是我的弱点之一。有人能让我摆脱苦难吗?

1 个答案:

答案 0 :(得分:0)

解决了!

我睡着了,想出了一个非常简单的方法来解决它。我的应用程序的UI现在看起来像一团糟,但我稍后会对此进行排序。我有我需要的信息。

我是这样做的:

    Dim PageElement As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
    For Each CurElement As HtmlElement In PageElement
        Dim linkunverified As String
        linkunverified = CurElement.GetAttribute("href")
        If linkunverified.Contains("viewprofile.aspx") Then
            If ListBox1.Items.Contains(linkunverified) Then
            Else
                ListBox1.Items.Add(linkunverified)
            End If

        End If


    Next

    For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
        If ele.GetAttribute("className").Contains("description") Then
            Dim content As String = ele.InnerHtml
            If content.Contains("http://pics.myserver.com/image/check.png") Then



            Else

                Dim i As Integer

                For i = 0 To ListBox1.Items.Count - 1
                    If content.Contains(ListBox1.Items(i).Remove(0, 24)) Then
                        ListBox2.Items.Add("http://www.myserver.com/" & ListBox1.Items(i).Remove(0, 24))
                    End If
                Next

            End If



            End If

    Next