如何确保正则表达式与匹配的匹配项匹配

时间:2016-06-15 23:51:42

标签: regex

我正在使用pcre(php)正则表达式并开发了以下正则表达式:

(?:-?)(?:[A-Z’\s\.-]{8}.*)(?:NY\s?|ON\s?|FL\s){1}([A-Z].*)(?:M\s\d{1,2}\s.*|F\s\d{1,2}.*)

我正在尝试应用以下字符串。在每个目标字符串下方,我提供了所需的匹配与实际匹配:

SUNDAY GEISHA-SUNDAY BREAK-JP NYHIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE  

LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME  

TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH

在每种情况下,正则表达式不会在第一次出现匹配时停止,而是消耗更多字符串并匹配第二次匹配。

您可以在regex101.com上看到一个有效的示例:WORKING EXAMPLE

如何让我的正则表达式在第一场比赛时停止,以便达到我想要的输出?我也欢迎任何关于如何改进表达的指示。

感谢您输入。

2 个答案:

答案 0 :(得分:1)

描述

^(?:[^ \n]* +){4}(.*?) +[a-z] +[0-9]+ [0-9a-z]+ Race [0-9]+$

Regular expression visualization

实施例

现场演示

https://regex101.com/r/kF9cU8/2

示例文字

SUNDAY GEISHA-SUNDAY BREAK-JP NY HIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE  

LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME  

TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH

样本匹配

MATCH 1
1.  [33-49] `HIT IT ONCE MORE`

MATCH 2
1.  [145-159]   `SUMMATION TIME`

MATCH 3
1.  [258-266]   `DONWORTH`

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (?:                      group, but do not capture (4 times):
----------------------------------------------------------------------
    [^ \n]*                  any character except: ' ', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
     +                       ' ' (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  ){4}                     end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
   +                       ' ' (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
   +                       ' ' (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  [0-9a-z]+                any character of: '0' to '9', 'a' to 'z'
                           (1 or more times (matching the most amount
                           possible))
----------------------------------------------------------------------
   Race                    ' Race '
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------

答案 1 :(得分:1)

嗯,更简单(但不是更有效)的方法:

/^.+(?:NY|FL|KY)\s?(.+?)(?: M.*)$/gmi

将带来:

  1. " HIT IT ONCE"
  2. " SUMMATION TIME"
  3. " DONWORTH"
  4. 试一试:https://regex101.com/r/yX2bI1/4