我正在使用流阅读器来获取某些页面的HTML,但有些行我想忽略,例如,如果一行以<span>
开头
任何建议? 这是我的功能
Public Function GetPageHTMLReaderNoPrx(ByVal address As Uri) As StreamReader
Dim request As HttpWebRequest
Dim response As HttpWebResponse = Nothing
Dim reader As StreamReader
Try
request = DirectCast(WebRequest.Create(address), HttpWebRequest)
response = DirectCast(request.GetResponse(), HttpWebResponse)
Select Case CType(response, Net.HttpWebResponse).StatusCode
Case 200
reader = New StreamReader(response.GetResponseStream(), Encoding.Default)
Case Else
MsgBox(CType(response, Net.HttpWebResponse).StatusCode)
End Select
Catch
If Not response Is Nothing Then response.Close()
End Try
Return reader
End Function
这就是HTML的样子
<tr>Text
<span>show all</span>
</tr>
答案 0 :(得分:1)
如果你坚持使用字符串,你可以这样做:
Do
Dim line As String = reader.ReadLine()
If line Is Nothing Then Exit Do 'end of stream
If line.StarsWith("<span>") Then Exit Do 'ignore this line
'otherwise do some processing here
'...
Loop
但这种方法并不稳定 - 输入HTML中的任何微小变化都可能会破坏您的流程。
更优雅的解决方案是使用XElement
:
Dim xml = <tr>Text
<span>show all</span>
</tr>
xml.<span>.Remove()
MsgBox(xml.Value.Trim)