正则表达式一致分组

时间:2019-06-03 16:20:08

标签: regex regex-group regex-greedy

我从RegEx for capturing a repeating pattern修改了这个比较凌乱的正则表达式 https://regex101.com/r/Trdwks/1

(([0-9]{1,2}h)[ ]*([0-9]{1,2}min):\s*|([0-9]{1,2}h)():\s*|()([0-9]{1,2}min):\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)

想法是它与该字符串匹配,将小时,分钟和描述分组。

1h 30min: Title 
- Description Line 1
3h: SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3


1h 14min: Title 
- another Great one 42min: Title - Great Movie
- Description Line 2
- Description Line 3

并产生以下结果:

Match 1:
  "1h 30min: Title 
  - Description Line 1"

      Group 1: "1h"
      Group 2: "30min"
      Group 3: "Title 
               - Description Line 1"

Match 2:
  "3h: SECOND TITLE
 - Description Line 1
 - Description Line 2
 - Description Line 3"

      Group 1: "1h"
      Group 2: ""
      Group 3: "SECOND TITLE
               - Description Line 1
               - Description Line 2
               - Description Line 3"

Match 3:
  "1h 14min: Title 
   - another Great one"

      Group 1: "1h"
      Group 2: "14min"
      Group 3: "Title 
                - another Great one"

Match 4:
  "42min: Title - Great Movie
   - Description Line 2
   - Description Line 3"

      Group 1: ""
      Group 2: "42min"
      Group 3: "Title - Great Movie
                - Description Line 2
                - Description Line 3"

我在使分组保持一致方面遇到了很多麻烦,因为只能是几个小时,只有几分钟或两者兼而有之。因此,上述正则表达式可能会将分钟放在group 3group 6中。有没有一种方法可以修复初始或语句中的分组以在每种情况下返回一致的分组?

1 个答案:

答案 0 :(得分:3)

此解决方案仅需要支持先行断言。

(?s)(?=[^:]*\d[^:]*:)(([0-9]{1,2}h)?[ ]*([0-9]{1,2}min)?:\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)

https://regex101.com/r/gz4r9g/1

扩展

 (?s)
 (?= [^:]* \d [^:]* : )
 (                             # (1 start)
      ( [0-9]{1,2} h )?             # (2)
      [ ]* 
      ( [0-9]{1,2} min )?           # (3)
      : \s* 
 )                             # (1 end)
 (                             # (4 start)
      (?:
           . 
           (?!
                (                             # (5 start)
                     \d h \s \d{1,2} min
                  |  \d h
                  |  \d{1,2} min 
                )                             # (5 end)
           )
      )+
 )                             # (4 end)

此解决方案仅需要支持分支重置。

(?s)(?|([0-9]{1,2}h)[ ]*([0-9]{1,2}min)|([0-9]{1,2}h)()|()([0-9]{1,2}min)):\s*((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)

https://regex101.com/r/pyACdi/1

扩展

 (?s)
 (?|
      ( [0-9]{1,2} h )              # (1)
      [ ]* 
      ( [0-9]{1,2} min )            # (2)
   |  ( [0-9]{1,2} h )              # (1)
      ( )                           # (2)
   |  ( )                           # (1)
      ( [0-9]{1,2} min )            # (2)
 )
 : \s* 
 (                             # (3 start)
      (?:
           . 
           (?!
                (                             # (4 start)
                     \d h \s \d{1,2} min
                  |  \d h
                  |  \d{1,2} min 
                )                             # (4 end)
           )
      )+
 )                             # (3 end)