使用html标记从字符串中提取子字符串

时间:2014-05-07 05:46:02

标签: c# html asp.net string substring

我需要从数据库中获取以下HTML字符串并提取不同的元素并将它们放入属性中。换句话说,我需要提取" pProductDetailsVendorDescription"并将其放入属性中,然后提取" pProductDetailsProductDescription"并将所有这些实例放入另一个属性中。此字符串中可能还有其他P标签,并且所有标签都有不同的顺序。

这是HTML字符串:

<p class="pProductDetailsVendorDescription">PowerDrive has the largest selection of products of any Sheave Manufacturer assurance to have the best product for specific application and most economical drive design.  All sheaves are balanced & accurately machined to minimize vibration.</p><p class="pProductDetailsProductDescription">All Bushings Must be Ordered Separately</p><p class="pProductDetailsProductDescription">Sheaves are machined from Gray Cast iron, statically balanced & painted.  Cast Iron Sheaves may NOT exceed 6500 RPM.</p>

执行我需要完成的工作的有效方法是什么?

1 个答案:

答案 0 :(得分:1)

使用正则表达式

string pattern = @"<p\sclass=""([a-zA-Z]*)"">(.*?)</p>";
Regex r = new Regex(pattern, RegexOptions.None);
string s = @"...";

foreach (Match m in r.Matches(s))
{
   ...
}

演示:http://dotnetfiddle.net/FDs7tn

相关问题