Question

如何使用正则表达式选择</h2>结束标记后的文字，直到下一个<h2>开始标记

<h2>my title here</h2>
Lorem ipsum dolor sit amet <b>with more tags</b>
<h2>my title here</h2>
consectetur adipisicing elit quod tempora

在这种情况下，我想选择此文字：Lorem ipsum dolor sit amet <b>with more tags</b>

Answer 1

试试这个：/<\/h2>(.*?)</g

这会找到一个结束标记，然后在新的开始标记之前捕获任何内容。

在JS中，您只需要获取文本：

substr = str.match(/<\/h2>(.*?)<h2/)[1];

Regex101

var str = '<h2>my title here</h2>Lorem ipsum <b>dolor</b> sit amet<h2>my title here</h2>consectetur adipisicing elit quod tempora';

var substr = str.match(/<\/h2>(.*?)<h2/)[1].replace(/<.*?>/g, '');

console.log(substr);
//returns: Lorem ipsum dolor sit amet

Answer 2

尝试

/<\/h2>((?:\s|.)*)<h2/

你可以在行动on this regex tester中看到它。

您也可以在下面的示例中看到它。

＆＃13;

(function() {
  "use strict";

  var inString, regEx, res, outEl;

  outEl = document.getElementById("output");

  inString = "<h2>my title here</h2>\n" +
    "Lorem ipsum dolor sit amet <b>with more tags</b>\n" +
    "<h2> my title here </h2>\n" +
    "consectetur adipisicing elit quod tempora"

  regEx = /<\/h2>((?:\s|.)*)<h2/

  res = regEx.exec(inString);

  console.log(res);
  res.slice(1).forEach(function(match) {
    var newEl = document.createElement("pre");
    newEl.innerHTML = match.replace(/</g, "&lt;").replace(/>/g, "&gt;");
    outEl.appendChild(newEl);
  });
}());

＆＃13;

<main>
  <div id="output"></div>
</main>

＆＃13;

我在您的示例中添加了\n来模拟新行。不知道你为什么不选择带<h2>的{{1}}并以此方式获取文字。

Answer 3

使用字符串replace()功能匹配标记并将其删除。此提议的解决方案removes any single closure tags like <br/>,<hr/>等

var htmlToParse = document.getElementsByClassName('input')[0].innerHTML;

var htmlToParse = htmlToParse.replace(/[\r\n]+/g,""); // clean up the multiLine HTML string into singleline

var selectedRangeString =  htmlToParse.match(/(<h2>.+<h2>)/g); //match the string between the h2 tags

var parsedString = selectedRangeString[0].replace(/((<\w+>(.*?)<\/\w+>)|<.*?>)/g, ""); //removes all the tags and string within it, Also single tags like <br/> <hr/> are also removed

document.getElementsByClassName('output')[0].innerHTML += parsedString;

<div class='input'>
    <i>Input</i>

  <h2>my title here</h2>
  Lorem ipsum dolor sit amet <br/> <b>with more tags</b>
<hr/>
  <h2>my title here</h2>
  consectetur adipisicing elit quod tempora
</div>

<hr/>
<div class='output'>
  <i>Output</i>
  <br/>
</div>

代码中要记住的事情。

htmlToParse.match(/(<h2>.+<h2>)/g);返回一个字符串数组，即从此正则表达式匹配的所有字符串。

selectedRangeString[0]我正在使用第一场比赛进行演示。如果你想玩所有字符串，那么你可以用相同的逻辑循环它。

关闭标记与开始标记之间的Javascript reg exp

3 个答案: