Question

我正在尝试从带有正则表达式的字符串中过滤掉一些垃圾文本，但似乎无法让它工作。我不是一个正则表达式专家（甚至不是很接近），我搜索了类似的例子，但似乎没有解决我的问题。

我需要一个正则表达式，它匹配从字符串的开头到该字符串中的特定单词但不包含单词本身的所有内容。

这是一个例子：

<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p>
<p>I want to remove everything in the string BEFORE the word "giraffe" (but not "giraffe" itself and keep everything after it.</p>

那么，如何在“长颈鹿”这个词之前匹配字符串中的所有内容？

谢谢！

Answer 1

resultString = Regex.Replace(subjectString, 
    @"\A             # Start of string
    (?:              # Match...
     (?!""giraffe"") #  (unless we're at the start of the string ""giraffe"")
    .                #  any character (including newlines)
    )*               # zero or more times", 
    "", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);

应该有用。

Answer 2

为何选择正则表达式？

String s = "blagiraffe";
s = s.SubString(s.IndexOf("giraffe"));

Answer 3

试试这个：

    var s =
         @"<p>This is the string I want to process with as you can see also contains HTML tags like <i>this</i> and <strong>this</strong></p>
         <p>I want to remove everything in the string BEFORE the word ""giraffe"" (but not ""giraffe"" itself and keep everything after it.</p>";
    var ex = new Regex("giraffe.*$", RegexOptions.Multiline);
    Console.WriteLine(ex.Match(s).Value);

此代码段产生以下输出：

giraffe" (but not "giraffe" itself and keep everything after it.</p>

Answer 4

look-ahead可以解决问题：

^.*(?=\s+giraffe)

Answer 5

你可以使用像这样的前瞻模式

^.*?(?=giraffe)

匹配多行字符串中特定单词之前的所有内容

5 个答案: