用于查找特定网址的正则表达式,包括<a href..=""> html tag

时间:2016-08-22 02:48:29

标签: regex parsing text jenkins

Here is the Console log:

10:16:02 2016-08-10 10:16:01.087 [INFO] (1): DEVICE_DAILY_SKIPS_SUBSCRIBER=60
10:16:02 2016-08-10 10:16:01.087 [INFO] (1): DEVICE_DAILY_SKIPS_REGISTERED=48
10:16:02 2016-08-10 10:16:01.088 [INFO] (1): DEVICE_HOURLY_STATION_SKIPS_SUBSCRIBER=6
10:16:02 2016-08-10 10:16:01.284 [INFO] (1): Post results =true
10:16:02 2016-08-10 10:16:01.290 [INFO] (1): Calling Api......
10:16:05 2016-08-10 10:16:04.289 [INFO] (1): Run URL = <a href="https://sv5.ad.mobile.com/index.php?/runs/view/2435" target="_blank">Run = R2435</a>
10:16:05 2016-08-10 10:16:04.293 [INFO] (1): [CONFIGURATION BeforeSuite] AbstractBaseTest#setUpBeforeSuite
10:16:05 2016-08-10 10:16:04.307 [INFO] (1): SHORT_TIMEOUT: 1000

Above is the Jenkins build console log and I am in the process of parsing it to find the desired URL along with the enclosing <a href.. html tag. For example in above log, I want to find this: <a href="https://sv5.ad.mobile.com/index.php?/runs/view/2435" target="_blank">Run = R2435</a> with the help of Regular Expressions.

Here is what I have tried:

<a href="https://sv5.ad.mobile.com/index.php?/runs/view/.*"> but does not seem to work. Also, is there a way to have a little compact regular expression for such kind of this search? How to find such URLs in the logs with the help of regex?

1 个答案:

答案 0 :(得分:1)

一旦你逃离了.?,你已经应该完成的工作。您还需要允许其他属性,例如target="_blank"

<a href="https://sv5\.ad\.mobile\.com/index\.php\?/runs/view/[^"]*"[^>]*>

[^"]*表示&#34;任意数量的字符不是双引号&#34;和[^>]*同样意味着&#34;任何数量的字符都不是&#39; t右尖括号。&#34;

您可能希望通过允许属性出现在href之前更加灵活:

<a [^>]*href="https://sv5\.ad\.mobile\.com/index\.php\?/runs/view/[^"]*"[^>]*>

至于它是否可以更紧凑,这取决于你想要找到的东西。你只给了我们一个例子,所以我们很难推测。

相关问题