匹配所有<a> tags except for domains a, b and c

时间:2018-05-18 13:46:37

标签: regex

My Wordpress website was hacked and I have since increased security and changed all info, etc. The hackers have inserted random links to spam websites all through the texts of posts.

I want to find literally all urls in all my posts, but exclude urls containing certain domains. That way I can bulk remove all links that do not point to a short list of domains I normally link to with a plugin.

The plugin requires me to write it a certain way, this works to find all urls in the HTML:

[<a.*</a>]

I got playing around and ended up with a few versions I tried, but I can't seem to completely figure it out.

[<a.*</a>(?!domain1.com|domain2.com)]

[<a.*(?!domain1.com|domain2.com).*</a>]

Anyone who can give me a push in the right direction?


Addition: I'm using this Wordpress plugin: https://www.wordpress.org/plugins/search-regex/

它已经过时了,但在其他情况下工作正常。

1 个答案:

答案 0 :(得分:1)

你可以使用带有负前瞻的正则表达式:

<a (?:(?!(?:domain1|domain2)\.).)+?</a>

RegEx Demo

此处(?:(?!(?:domain1|domain2)\.).是一个负前瞻,在每个字符之前执行前瞻断言,以确保它不是domain1domain2.