任何人都可以帮我干这个REGEX吗?

时间:2014-09-28 15:44:57

标签: ruby regex

这是我的第一个问题(尽管我过去在Stack Overflow上找到了许多完美的解决方案 - 这是我的第一个帮助来源)。

我的文字字符串包含一个月和一系列日期。有时,字符串中有两个月。

date1 = "January 9, 10, 15, 16, 17, 18, 22, 23, 24"
date2 = "September 19, 20, 25, 26, 27, 28, October 2, 3, 4, 10, 11"

我编写了一段非常WET的代码,用于从字符串中提取月份,并添加每个日期和年份。 但是,有几个问题我无法弄清楚。

  1. 通过日期迭代:我知道我应该使用EACH方法迭代日期。我尝试了但是我不能让它工作,所以我通过将月份与每个日期元素连接起来很难。显而易见的问题是,我不知道会有多少日期,所以我必须构建最长的字符串并使用IF语句来确定我是否已到达字符串的末尾。我应该使用dates1.length = x加上DO EACH,但我无法让它工作。

  2. 连续一个月的日子:我的非常糟糕的湿代码就像将monrg一起拉到日期和年份一样,但我如何摆脱括号和引号呢?

  3. 多个月:我如何选择字符串中的第二个月,并且只连接月份名称后的各个日期以获得MONTH / DD / YY?

  4. 以下是我非常糟糕的代码示例。

    require 'rubygems'
    require 'nokogiri'
    require 'open-uri'
    
    date1 = "January 9, 10, 15, 16, 17, 18, 22, 23, 24"
    date2 = "September 19, 20, 25, 26, 27, 28, October 2, 3, 4, 10, 11"
    datetext = date1.scan(/([\w\-]+)/)     #=> pulls the whole string 
    datetext2 = date1.scan(/(\w*)\s?/)[0]  #=> this pulls the month
    datenumbers = date1.scan(/(\d+)/)
    firstdate = datenumbers[0]             #=>ithe first date following the first month
    seconddate = datenumbers[1]
    year = "2014"
    
    mdy1 = "#{datetext2} #{firstdate} #{year}"
    mdy2 = "#{datetext2} #{seconddate} #{year}"
    
    puts date1
    puts " "
    puts datetext2 #=> this variable adds the [0] delimiter to pull the 1st month
    puts firstdate
    puts " "
    puts mdy1
    puts mdy2
    puts " "
    

1 个答案:

答案 0 :(得分:0)

我建议你这样做。

<强>代码

def extract_dates_by_month(str)
  str.scan(/[A-Z][a-z]+|\d+/).each_with_object([]) { |e,b|
    e[0][/[A-Z]/] ? b << [e,[]] : b.last.last << e }
end

示例

str = "September 19, 20, 25, 26, October 2, 3, 4, 10, November 3, 12, 17"
extract_dates_by_month(str)
  #=> [["September", ["19", "20", "25", "26"]],
  #    ["October", ["2", "3", "4", "10"]],
  #    ["November", ["3", "12", "17"]]]

<强>解释

第一步是提取月份名称和日期:

a = str.scan(/[A-Z][a-z]+|\d+/)
  #=> ["September", "19", "20", "25", "26", "October", "2", "3", "4", "10",
  #    "November", "3", "12", "17"]

然后我们将这个数组分成几个月:

a.each_with_object([]) { |e,b| e[0][/[A-Z]/] ? b << [e,[]] : b.last.last << e }
  #=> [["September", ["19", "20", "25", "26"]],
  #    ["October", ["2", "3", "4", "10"]],
  #    ["November", ["3", "12", "17"]]]

Enumerable#each_with_object为块变量b创建一个初始为空的数组,该方法返回该数组。 a的每个元素都被传递到块中,并由块变量e引用。执行以下操作:

b = []
e = "September"
e[0][/[A-Z]/] #=> "S"
b << [e,[]]   #=> [["September", []]]

e = "19"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19"]]]

e = "20"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20"]]]

e = "25"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20", "25"]]]

e = "26"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20", "25", "26"]]]

e = "October"
e[0][/[A-Z]/] #=> "O"
b << [e,[]]   #=> [["September", ["19", "20", "25", "26"]], ["October", []]]

等等。

如果您希望日期为整数,请更改:

b.last.last << e

为:

b.last.last << e.to_i