Question

我正在尝试使用正则表达式从字符串中提取名称。该名称始终遵循协议。协议为：ssh，folder，http。

Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *
Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *
Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *

预期输出为：

John
Jake
Steve

Answer 1

您可以使用以下PCRE正则表达式（因为您尚未确定使用哪种语言）：

\b[a-zA-Z]+(?=\s+(?:ssh|folder|http))

演示： https://regex101.com/r/t62Ra7/4/

说明：

\b从单词边界开始比赛
[a-zA-Z]+匹配a-zA-Z范围内的任何ASCII字符序列，您可能必须对此加以概括才能接受Unicode字母。
(?=前瞻模式可添加约束，即名称后接协议之一
\s+空白类char
(?:ssh|folder|http)个非捕获组，用于协议ssh，folder或http

Answer 2

这是用Java进行编码的方法。

String[] str = {
            "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *    ",
            "Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 * ",
            "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *  ",
      };

      String pat = "(\\w+) (ssh|folder|http)"; // need to escape the second \
      Pattern p = Pattern.compile(pat);
      for (String s : str) {
         Matcher m = p.matcher(s);
         if (m.find()) {
            System.out.println(m.group(1));
         }

      }
   }

实际模式在字符串pat中，可以与其他正则表达式引擎一起使用。这只是匹配一个名称，后跟一个空格，然后是协议，或将其匹配。但是它会在第一个捕获组中捕获名称。

Answer 3

尝试：

\b[A-Za-z]+(?=\s(?=ssh|folder|http))

正则表达式演示here。

let regex = /\b[A-Za-z]+(?=\s(?=ssh|folder|http))/g;

[match] = "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *".match(regex);
console.log(match); //John

[match] = "Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *".match(regex);
console.log(match); //Jake

[match] = "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *".match(regex);
console.log(match); //Steve

正则表达式说明：

\b定义了一个单词边界以开始匹配

[A-Za-z]匹配任何字母，任何大小写

+重复上一个字符多次直到下一个模式

(?=找到超前模式（不会包含在匹配组中）

\s空格

(?=ssh|folder|http)再次向前ssh，folder或http

将所有内容放在一起，正则表达式会先查找一个单词，后跟一个空格，然后是以下内容之一：ssh，文件夹或http。

Answer 4

另一种方法是将名称之前的单个字母和空格作为左边界，然后收集名称的字母并将其保存在捕获组$1中，也许类似于：

\s+[a-z]\s+([A-Z][a-z]+)

如果有必要，我们还可以为其添加更多边界。

RegEx

如果不需要此表达式，可以在regex101.com中对其进行修改或更改。

RegEx电路

jex.im可视化正则表达式：

DEMO

测试

const regex = /\s+[a-z]\s+([A-Z][a-z]+)/gm;
const str = `Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *
Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *
Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

正则表达式从字符串中提取名称

4 个答案:

RegEx

RegEx电路

DEMO

测试