Powershell Regex - 提取两个给定字符之间的字符串

时间:2015-03-09 17:11:20

标签: regex powershell

我是一个正则表达式/ powershell初学者,并努力让这个工作。我正在使用一些HTML数据,我需要能够提取给定字符之间的字符串。在下面的例子中,我需要提取字符串>之间的字符串(如果它匹配我的搜索字符串)和< 。我在这里提供了多个例子,我希望我能清楚地提出问题。任何帮助是极大的赞赏。

例如 -

$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'

$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'


$searchstring = "ILOM"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = ILOM 2.6.1.6.a  # expected result    

同样 -

$searchstring = "BIOS"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = BIOS vers. 0CDAN860  # expected result

$searchstring = "T5120"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5120 Server   # expected result

$searchstring = "T5220"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5220 Server  # expected result

1 个答案:

答案 0 :(得分:1)

您需要在&#34;通配符&#34;上添加惰性?运算符(?限定符?)。在你的搜索字符串之后,它会在<的第一次出现时停止。

.*< = Any character as many as possible until an <

.*?< = Any character until first <

我会在&#34;通配符&#34;上使用惰性运算符。在你的搜索字符串之前,即使在这种特殊情况下没有必要,也只是为了安全。

最低要求的修改:

".+>(.*$searchstring.+?)<"

我建议:

".+>(.*?$searchstring.+?)<"

样品:

$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'

$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'


$searchstring = "ILOM"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

#Custom regex
$searchstring = "BIOS"
$regex = ".+>($searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

#Or the original regex with different search string
$searchstring = "BIOS vers"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

$searchstring = "T5120"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }

$searchstring = "T5220"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }

输出:

ILOM 2.6.1.6.a 
BIOS vers. 0CDAN860 
BIOS vers. 0CDAN860 
Enterprise T5120 Server
Enterprise T5220 Server
相关问题