正则表达式匹配草率分数/混合数字

时间:2008-10-29 00:13:33

标签: regex

我有一系列包含混合数字的文本(即:整个部分和一个小数部分)。问题是文本充满了人为编码的邋iness:

  1. 整个部分可能存在也可能不存在(例如:“10”)
  2. 小数部分可能存在也可能不存在(例如:“1/3”)
  3. 这两个部分可以用空格和/或连字符分隔(例如:“10 1/3”,“10-1 / 3”,“10 - 1/3”)。
  4. 分数本身在数字和斜线之间可能有也可能没有空格(例如:“1/3”,“1/3”,“1/3”)。
  5. 在需要忽略的分数后面可能还有其他文字
  6. 我需要一个可以解析这些元素的正则表达式,这样我就可以从这个混乱中创建一个正确的数字。

3 个答案:

答案 0 :(得分:11)

这是一个正则表达式,它将处理我可以抛出的所有数据:

(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$

这会将数字放入以下组中:

  1. 混合数字的整个部分(如果存在)
  2. 分子,如果分数退出
  3. 分母,如果存在分数
  4. 此外,这里是RegexBuddy对元素的解释(在构建它时极大地帮助了我):

    Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?»
       Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
       Match a single digit 0..9 «\d++»
          Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
       Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)»
          Match the character “ ” literally « *»
             Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
          Match the character “/” literally «/»
    Match the character “ ” literally « *»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match the character “-” literally «-?»
       Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
    Match the character “ ” literally « *»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match the regular expression below «(?:(\d+) */ *(\d+))?»
       Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
       Match the regular expression below and capture its match into backreference number 2 «(\d+)»
          Match a single digit 0..9 «\d+»
             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
       Match the character “ ” literally « *»
          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
       Match the character “/” literally «/»
       Match the character “ ” literally « *»
          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
       Match the regular expression below and capture its match into backreference number 3 «(\d+)»
          Match a single digit 0..9 «\d+»
             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    Match any single character that is not a line break character «.*»
       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
    

答案 1 :(得分:2)

我认为可能更容易解决彼此分开的不同情况(完全混合,仅分数,仅数字)。例如:

sub parse_mixed {
  my($mixed) = @_;

  if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) {
    return $1+$2/$3;
  } elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) {
    return $1/$2;
  } elsif($mixed =~ /^ *(\d+)(\D.*)?$/) {
    return $1;
  }
}

print parse_mixed("10"), "\n";
print parse_mixed("1/3"), "\n";
print parse_mixed("1 / 3"), "\n";
print parse_mixed("10 1/3"), "\n";
print parse_mixed("10-1/3"), "\n";
print parse_mixed("10 - 1/3"), "\n";

答案 2 :(得分:1)

如果您使用Perl 5.10,我就会这样写。

m{
  ^
  \s*       # skip leading spaces

  (?'whole'
   \d++
   (?! \s*[\/] )   # there should not be a slash immediately following a whole number
  )

  \s*

  (?:    # the rest should fail or succeed as a group

    -?        # ignore possible neg sign
    \s*

    (?'numerator'
     \d+
    )

    \s*
    [\/]
    \s*

    (?'denominator'
     \d+
    )
  )?
}x

然后您可以访问%+变量中的值,如下所示:

$+{whole};
$+{numerator};
$+{denominator};