Question

我试图捕获所有字符串减去*any text*的任何出现（我不需要解析HTML或任何东西，我只需要忽略那些整个部分。标签必须完全匹配才能删除，因为我想保留其他标签）。在一个给定的字符串中，至少会有一个标签，没有上限（尽管不止一个是不常见的）

我的最终目标是匹配两个文本，一个是变量名称，另一个是变量名称已被其值替换（不能自己替换变量，我无法访问该数据库）。这些变量将始终被我提到的span标记包围。我知道我的标签说“notranslate” - 但这是预翻译，因此所有其他文本都完全相同。

例如，如果这些是我的两个输入文本：

Dear $customer, I am sorry that you are having trouble logging in. Please follow the instructions at this URL $article431 and let me know if that fixes your problem.

Dear John Doe, I am sorry that you are having trouble logging in. Please follow the instructions at this URL http://url.for.help/article and let me know if that fixes your problem.

我希望正则表达式返回：
Dear , I am sorry that you are having trouble logging in. Please follow the instructions at this URL and let me know if that fixes your problem.
OR
Dear , I am sorry that you are having trouble logging in. Please follow the instructions at this URL and let me know if that fixes your problem.
对于他们两个，所以我可以轻松地做String.Equals（）并找出它们是否相等。（我需要将输入w /变量与已替换变量的多个文本进行比较，以找到匹配项）

我很容易想出一个正则表达式，告诉我字符串中是否有任何“notranslate”部分：((.+?))，这就是我如何决定是否需要在比较之前删除部分。但是我在上面的（我认为非常相似）任务上遇到了很多麻烦。

我正在使用Expresso和regexstorm.net进行测试，并使用来自其他SO问题的想法玩过(?:(.+?)(?:(?:.+?)))的许多变体，但是所有这些都得到了我不明白的问题。例如，那个似乎在Expresso中几乎可以工作，但它不能在最后一组span标签之后获取结束文本;当我使span标签可选或尝试添加另一个（。+？）时，它根本不会抓取任何东西？我已经尝试使用前瞻，但后来我仍然最终抓住标签+内部文本。

Answer 1

这将捕获所有，然后处理被忽略的匹配的html标记。

string data = "Dear <span class=\"notranslate\">$customer</span>, I am sorry that you\r\n  are havin" +
    "g trouble logging in. Please follow the instructions at this\r\n  URL <span class=" +
    "\"notranslate\">$article431</span> and let me know if\r\n  that fixes your problem.";

string pattern = @"(?<Words>[^<]+)(?<Ignore><[^>]+>[^>]+>)?";

Regex.Matches(data, pattern)
     .OfType<Match>()
     .Select(mt => mt.Groups["Words"].Value)
     .Aggregate((sentance, words) => sentance + words );

结果是一个字符串，它实际上包含原始回车符和换行符中的换行符：

Dear , I am sorry that you
  are having trouble logging in. Please follow the instructions at this
  URL  and let me know if
  that fixes your problem.

使用正则表达式捕获除某个（可能重复）模式之外的所有内容

1 个答案: