如何在<pre> using Xpath?

时间:2016-02-07 10:49:47

标签: c# xpath

I'm trying to get the text within pre tags in C# which uses Xpath. The webpage only consists of the following:

<body>
    <pre>
        The text I am trying to select
    </pre>
</body>

I can't seem to select just that text and I don't exactly know how to put that in a string. This is the code I'm using:

var WebgetME2_ = new HtmlWeb();
var docME2_ = WebgetME2_.Load(webpage); //loading the webpage
HtmlNode NODEME2_ = docME2_.DocumentNode.SelectSingleNode("//*/pre"); //select the node
string innerME_ = NODEME2_.InnerText; //put the node innertext in string
// After getting the text within the <pre> tags I want to select a part of it using RegEx, that's why I need it in a string
string imagineME2_ = Regex.Match(innerME_, "(?=http)(.+?)(?<=.jpg)").ToString();

2 个答案:

答案 0 :(得分:1)

我找到了绕过问题的方法。

System.Net.WebClient WebclientME_ = new System.Net.WebClient();
byte[] rawME_ = WebclientME_.DownloadData(webpage); //download page
string innerME_ = System.Text.Encoding.UTF8.GetString(rawME_); //write to a string
string imagineME2_ = Regex.Match(innerME_, "(?=http)(.+?)(?<=.jpg)").ToString();

它会下载我不喜欢的页面,因为它的工作速度较慢但是有效。

答案 1 :(得分:1)

请在Xpath下面尝试: -

/body/pre/text()

text()函数将从您在Xpath中提到的路径

中检索所有文本

pre应该是body的子项..如果它不是你的真实结构,那么使用双斜杠// .... //意味着它将从你的完整DOM中找到前节点。 / p>

您也可以尝试使用以下Xpath: -

/body//pre/text()

希望它会对你有所帮助:)。