Question

我需要帮助使用Regex for notepad ++来匹配除XML之外的所有内容

我正在使用的正则表达式： WITH RECURSIVE selectedTrains(name) AS( select train from visits where country in (select country from countries) group by train order by count(city) DESC LIMIT 1 UNION select train from visits where country in (select country from countries) and city not in ( select city from visits where train in (select name from selectedTrains) and country in (select country from countries) ) group by train order by count(city) DESC LIMIT 1 ), countries(country) AS ( select country_name from country_data where country_name in ("USA","China","India") ) SELECT * FROM train_data WHERE train_no IN selectedTrains;＆lt; - 我希望与此相反（前三行）

示例代码：

(!?\<.*\>)

预期结果：

[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>

提前致谢！

Answer 1

这并不完美，但应该使用看起来非常简单且结构合理的输入。

如果您只需处理一个未加载的<Person>代码，则可以使用简单的(<Person>.*?</Person>)|.正则表达式（将匹配并捕获到第1组{{1} } tag并将匹配任何其他char）并替换为条件替换模式<Person>（将使用换行符重新插入(?{1}$1\n:)标记，或者将匹配替换为空字符串）：

为了使它更通用，您可以使用基于递归的Boost正则表达式以及相应的条件替换模式捕获开始和相应的结束XML标记：

查找内容：Person
替换为：(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)|.
(?{1}$1\n:)匹配换行符：.

正则表达式详细信息：

ON - 捕获第1组（稍后将通过(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)子路由调用进行递归）匹配
- (?1) - 任何开头标记，其名称已捕获到第2组
- <(\w+)[^>]*> - 零次或多次出现：
  - (?:(?!</?\2\b).|(?1))* - 任何字符（(?!</?\2\b).）未开始. +标记名称序列作为整个单词，前面带有可选的</
  - / - 或
  - | - 整个第1组子模式被递归（重复）
- (?1) - 相应的结束标记
</\2> - 或
| - 任何一个字符。

替换模式：

. - 如果第1组匹配：
- (?{1} - 替换为其内容+换行符
- $1\n - 其他用空字符串替换
: - 替换模式结束。

正则表达式删除除XML之外的所有内容

1 个答案: