Question

我正在使用流阅读器来获取某些页面的HTML，但有些行我想忽略，例如，如果一行以<span>开头

任何建议？这是我的功能

Public Function GetPageHTMLReaderNoPrx(ByVal address As Uri) As StreamReader
  Dim request As HttpWebRequest
  Dim response As HttpWebResponse = Nothing
  Dim reader As StreamReader

  Try
    request = DirectCast(WebRequest.Create(address), HttpWebRequest)
    response = DirectCast(request.GetResponse(), HttpWebResponse)

    Select Case CType(response, Net.HttpWebResponse).StatusCode
      Case 200
        reader = New StreamReader(response.GetResponseStream(), Encoding.Default)

      Case Else
        MsgBox(CType(response, Net.HttpWebResponse).StatusCode)
    End Select
  Catch
    If Not response Is Nothing Then response.Close()
  End Try
  Return reader
End Function

这就是HTML的样子

<tr>Text
<span>show all</span>
</tr>

Answer 1

如果你坚持使用字符串，你可以这样做：

Do
  Dim line As String = reader.ReadLine()
  If line Is Nothing Then Exit Do 'end of stream
  If line.StarsWith("<span>") Then Exit Do 'ignore this line
  'otherwise do some processing here
  '...
Loop

但这种方法并不稳定 - 输入HTML中的任何微小变化都可能会破坏您的流程。

更优雅的解决方案是使用XElement：

Dim xml = <tr>Text
            <span>show all</span>
          </tr>
xml.<span>.Remove()
MsgBox(xml.Value.Trim)

VB.net如何让流阅读器忽略一些行？

1 个答案: