在2个html标签之间获取文本c#

时间:2012-06-25 16:14:34

标签: c# html string tags

我试图在提供的html(span)之间获取数据(在本例中为31)

这是原始代码(来自chrome中的inspect元素)

<span id="point_total" class="tooltip" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again." aria-describedby="ui-tooltip-0">31</span>

我有一个包含页面源的富文本框,这里是相同的代码,但在富文本框的第51行:

<DIV id=point_display>You have<BR><SPAN id=point_total class=tooltip jQuery16207621750175125325="23" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.">17</SPAN><BR>Points </DIV><IMG style="FLOAT: right" title="Gain subscribers" border=0 alt="When people subscribe to you, you lose a point" src="http://static.subxcess.com/images/page/decoration/remove-1-point.png"> </DIV>

我该怎么做呢?我尝试了几种方法,但似乎没有一种方法适合我。

我正在尝试从此页面检索点值:http://www.subxcess.com/sub4sub.php 数字会根据您的身份而变化。

3 个答案:

答案 0 :(得分:10)

您需要使用HtmlAgilityPack来执行此操作,这非常简单:

HtmlDocument doc = new HtmlDocument;
doc.Load("filepath");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//span"); //Here, you can also do something like (".//span[@id='point_total' class='tooltip' jQuery16207621750175125325='23' oldtitle='Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.']"); to select specific spans, etc...

string value = node.InnerText; //this string will contain the value of span, i.e. <span>***value***</span>

正则表达式虽然是一个可行的选项,但如果可能的话,你通常会想要避免解析html(参见Here

就可持续性而言,您需要确保了解页面源(即刷新几次,看看每次刷新后目标跨度是否嵌套在同一个父级中,请确保页面为以相同的通用格式等...,然后使用上述原则导航到跨度。)

答案 1 :(得分:9)

你可能非常具体:

var regex = new Regex(@"<span id=""point_total"" class=""tooltip"" oldtitle="".*?"" aria-describedby=""ui-tooltip-0"">(.*?)</span>");

var match = regex.Match(@"<span id=""point_total"" class=""tooltip"" oldtitle=""Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again."" aria-describedby=""ui-tooltip-0"">31</span>");

var result = match.Groups[1].Value;

答案 2 :(得分:1)

有多种可能性。

  1. Regex
  2. 将HTML解析为XML并通过XPath
  3. 获取值
  4. 遍历所有元素。如果您使用span标记,请跳过所有字符,直到找到结束'&gt;'。那么你需要的价值就是下一次开盘之前的一切'&lt;'
  5. 另请参阅System.Windows.Forms.HtmlDocument