正则表达式匹配字符串中的所有href,除非包含单词

时间:2018-10-27 20:23:28

标签: javascript regex

我正在尝试匹配字符串中的所有href,但是在href包含特定文本(例如login)时排除(我相信使用负前瞻),例如:

const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`

const match = str.match(/href="(.*?)"/g)

console.log(match)

这与所有href匹配,但不排除在其中找到login的可能性。我尝试了几种不同的变体,但实际上还没到任何地方。任何帮助将不胜感激!

4 个答案:

答案 0 :(得分:1)

您可以使用此正则表达式,使正则表达式后面的符号看起来不正确

href="(.*?)(?<!login)"

演示

https://regex101.com/r/15DwZE/1

编辑1: 正如第四只鸟指出的那样,上述regex可能无法正常运行,而不是提出一个复杂的regex来覆盖被拒绝的url登录外观的所有可能性,这是一个javascript解决方案。

var myString = 'This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>';
var myRegexp = /href="(.*?)"/g;
match = myRegexp.exec(myString);
while (match != null) {
    if (match[1].indexOf('login') == -1) {
        console.log(match[1]);
    }
  match = myRegexp.exec(myString);
}

答案 1 :(得分:1)

您可以使用DOMParser,而无需使用正则表达式,并使用例如includes来检查href是否包含您的字符串。

let parser = new DOMParser();
let html = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
let doc = parser.parseFromString(html, "text/html");
let anchors = doc.querySelectorAll("a");
anchors.forEach(a => {
  if (!a.href.includes("login")) {
    console.log(a.href);
  }
});

答案 2 :(得分:0)

您可以拥有一个临时HTML节点,并从中获取所有<a>标签。然后按href过滤。示例代码:

const str = `This is some a string <a href="http://www.google.com">google</a> and this is another that should not be found <a href="https://www.google.com/login">login</a>`;
const d = document.createElement('div');
d.innerHTML = str;
Array.from(d.getElementsByTagName("a")).filter(a => !/login/.test(a.href))

答案 3 :(得分:0)

您可以使用此正则表达式来实现

/<[\w:]+(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(?:(['"])(?:(?!\1|login)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>/

https://regex101.com/r/LEQL7h/1

更多信息

 < [\w:]+               # Any tag
 (?= \s )
 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s href \s* = \s*      # href attribute
      (?:
           ( ['"] )               # (1), Quote
           (?:
                (?! \1 | login )       # href cnnot contain login
                [\S\s] 
           )*
           \1 
      )
 )
                        # Have href that does not contain login, match the rest of tag
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+

 >