正则表达式删除除XML之外的所有内容

时间:2017-03-30 10:56:50

标签: regex xml notepad++ negate

我需要帮助使用Regex for notepad ++来匹配除XML之外的所有内容

我正在使用的正则表达式: WITH RECURSIVE selectedTrains(name) AS( select train from visits where country in (select country from countries) group by train order by count(city) DESC LIMIT 1 UNION select train from visits where country in (select country from countries) and city not in ( select city from visits where train in (select name from selectedTrains) and country in (select country from countries) ) group by train order by count(city) DESC LIMIT 1 ), countries(country) AS ( select country_name from country_data where country_name in ("USA","China","India") ) SELECT * FROM train_data WHERE train_no IN selectedTrains; < - 我希望与此相反(前三行)

示例代码:

(!?\<.*\>)

预期结果:

[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>

提前致谢!

1 个答案:

答案 0 :(得分:2)

这并不完美,但应该使用看起来非常简单且结构合理的输入。

如果您只需处理一个未加载的<Person>代码,则可以使用简单的(<Person>.*?</Person>)|.正则表达式(将匹配并捕获到第1组{{1} } tag并将匹配任何其他char)并替换为条件替换模式<Person>(将使用换行符重新插入(?{1}$1\n:)标记,或者将匹配替换为空字符串):

enter image description here

为了使它更通用,您可以使用基于递归的Boost正则表达式以及相应的条件替换模式捕获开始和相应的结束XML标记:

查找内容Person
替换为(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)|.
(?{1}$1\n:)匹配换行符.

enter image description here

正则表达式详细信息

  • ON - 捕获第1组(稍后将通过(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)子路由调用进行递归)匹配
    • (?1) - 任何开头标记,其名称已捕获到第2组
    • <(\w+)[^>]*> - 零次或多次出现:
      • (?:(?!</?\2\b).|(?1))* - 任何字符((?!</?\2\b).)未开始. +标记名称序列作为整个单词,前面带有可选的</
      • / - 或
      • | - 整个第1组子模式被递归(重复)
    • (?1) - 相应的结束标记
  • </\2> - 或
  • | - 任何一个字符。

替换模式

  • . - 如果第1组匹配:
    • (?{1} - 替换为其内容+换行符
    • $1\n - 其他用空字符串替换
  • : - 替换模式结束。