捕获多个匹配的最佳方式

时间:2016-04-24 19:18:55

标签: ruby regex

在相同的文本消息中固定一次(项目的ID)和多行(每个部分的几个参考和尺寸):

..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53

我想解析并存储IdCodeReferencesArray with dimensions

应用REGEX.match(my_text)方法时,仅获取REFCMT的首次发生:

REGEX = %r{
ID\/(?<IdCode> \d{10})\s 
(REF\/(?<ReferenceCode> \w{3}\-\d{3}\-\d)\s)+ 
(CMT\/(?<Length> \d+)\-(?<Width> \d+)\-(?<Height> \d+)\s)+
}x

结果如下:

IdCode: "1100008273"
ReferenceCode:  "D14-219-0"
Length: "37"
Width:  "37"
Height: "20"

有没有办法在不迭代的情况下捕获多个事件?

2 个答案:

答案 0 :(得分:1)

假设你的字符串是:

str = %w| dog
          ID/11000082734
          REF/D14-109-0
          REF/D14-209-0
          CMT/49-41-31
          CMT/44-34-53
          cat
          ID/11000082735
          REF/D14-109-1
          REF/D14-209-1
          CMT/49-41-32
          CMT/44-34-54
          pig |.join("\n")

  #=> "dog\nID/11000082734\nREF/D14-109-0\nREF/D14-209-0\nCMT/49-41-31\nCMT/44-34-53\ncat\nID/11000082735\nREF/D14-109-1\nREF/D14-209-1\nCMT/49-41-32\nCMT/44-34-54\npig"

然后你可以写:

r = /(ID\/\d{11})                     # match string in capture group 1
    \n                                # match newline
    ((?:REF\/[A-Z]\d{2}-\d{3}-\d\n)+) # match consecutive REF lines in capture group 2
    ((?:CMT\/\d{2}-\d{2}-\d{2}\n)+)   # match consecutive CMT lines in capture group 3
    /x                                # free-spacing regex definition mode 

arr = str.scan(r)
  #=> [["ID/11000082734", "REF/D14-109-0\nREF/D14-209-0\n",
  #     "CMT/49-41-31\nCMT/44-34-53\n"],
  #    ["ID/11000082735", "REF/D14-109-1\nREF/D14-209-1\n",
  #     "CMT/49-41-32\nCMT/44-34-54\n"]]

无需迭代即可提取所需信息。

此时可能需要将arr转换为更方便的数据结构。例如:

arr.map do |a,b,c| 
  { :id  => a[/\d+/],
    :ref => b.split("\n").map { |s| s[4..-1] },
    :cmt => c.scan(/(\d{2})-(\d{2})-(\d{2})/).map { |e|
              [:length, :width, :height].zip(e.map(&:to_i)).to_h }
  }
end
  #=> [{ :id=>"11000082734",
  #      :ref=>["D14-109-0", "D14-209-0"],
  #      :cmt=>[{ :length=>49, :width=>41, :height=>31 },
  #             { :length=>44, :width=>34, :height=>53 }
  #            ]
  #    },
  #    { :id=>"11000082735",
  #      :ref=>["D14-109-1", "D14-209-1"],
  #      :cmt=>[{ :length=>49, :width=>41, :height=>32 },
  #             { :length=>44, :width=>34, :height=>54 }
  #            ]
  #    }
  #   ] 

答案 1 :(得分:0)

试试这个

(?<IdCode>\d{10,})|REF\/(?<ReferenceCode>\w{3}\-\d{3}\-\d)|CMT\/(?<Length>\d+)\-(?<Width>\d+)\-(?<Height>\d+)

Regex demo

<强>解释
( … ):捕获小组sample
?:一次或无sample
\:逃脱一个特殊字符sample
|:替代/或操作数sample
+:一个或多个sample

输入

..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53

输出:

MATCH 1
IdCode  [29-40] `11000082734`
MATCH 2
ReferenceCode   [45-54] `D14-109-0`
MATCH 3
ReferenceCode   [59-68] `D14-209-0`
MATCH 4
ReferenceCode   [73-82] `D14-219-0`
MATCH 5
Length  [87-89] `59`
Width   [90-92] `40`
Height  [93-95] `25`
MATCH 6
Length  [100-102]   `38`
Width   [103-105]   `25`
Height  [106-108]   `28`
MATCH 7
Length  [113-115]   `59`
Width   [116-118]   `40`
Height  [119-121]   `25`
MATCH 8
Length  [126-128]   `37`
Width   [129-131]   `37`
Height  [132-134]   `20`
MATCH 9
Length  [139-141]   `40`
Width   [142-144]   `40`
Height  [145-147]   `20`
MATCH 10
Length  [152-154]   `37`
Width   [155-157]   `37`
Height  [158-160]   `20`
MATCH 11
Length  [165-167]   `49`
Width   [168-170]   `41`
Height  [171-173]   `31`
MATCH 12
Length  [178-180]   `44`
Width   [181-183]   `34`
Height  [184-186]   `53`