按重复值拆分数组

时间:2017-08-04 21:01:57

标签: ruby

我有一个任意字符串的可变长度数组。一致性是字符串“hello”重复,我想用字符串“hello”分组数组。

所以给出了这个:

[
 "hello\r\n",
 "I\r\n",
 "am\r\n",
 "Bob\r\n",
 "hello\r\n",
 "How\r\n",
 "are you?\r\n"
]

我想要这个:

[
 [
   "hello\r\n",
   "I\r\n",
   "am\r\n",
   "Bob\r\n"
 ],
 [
   "hello\r\n",
   "How\r\n",
   "are you?\r\n"
 ]
]

我尝试过:

partition = []
last = input.size
index = 0
input.each_with_object([]) do |line, acc|
   index += 1
  if line == "hello\r\n"
    acc << partition
    partition = []
    partition << line
  else
    partition << line  
  end
  if index == last
    acc << partition
  end
  acc
end.delete_if(&:blank?)
=> [["hello\r\n", "I\r\n", "am\r\n", "Bob\r\n"], ["hello\r\n", "How\r\n", "are you?\r\n"]] 

结果是对的,但是有可能用ruby数组迭代器做我想要的吗?我的解决方案似乎很笨拙。

2 个答案:

答案 0 :(得分:5)

您可以使用Enumerable#slice_before

arr.slice_before { |i| i[/hello/] }.to_a      
 #=> [["hello\r\n", "I\r\n", "am\r\n", "Bob\r\n"],
 #    ["hello\r\n", "How\r\n", "are you?\r\n"]] 

或更简洁(正如@tokland所指出的):

arr.slice_before(/hello/).to_a

答案 1 :(得分:1)

这是一个不使用在Ruby v.2.2中引入的Enumerable#slice_before的方法。它适用于v1.9 +(如果each_with_object替换为reduce/inject,则适用于v1.87 +。)

<强>假设

我假设:

  • 第一个字符串前面的所有字符串以&#34; hello&#34;开头被丢弃
  • 匹配&#34;你好&#34;字符串必须开始&#34;你好&#34;并且不能仅仅包含你好的单词(例如&#34; hellonfire&#34;)

<强>代码

def group_em(arr, target)
  arr.each_with_object([]) { |s,a| (s =~ /\A#{target}(?!\p{alpha})/) ?
    (a << [s]) : (a.last << s unless a.empty?) }
end

示例

arr = ["Ahem\r\n", "hello\r\n", "I\r\n", "hello again\r\n", "am\r\n",
       "Bob\r\n", "hellonfire\r\n", "How\r\n", "are you?\r\n"]

group_em(arr, 'hello')
  #=> [["hello\r\n", "I\r\n"],
  #    ["hello again\r\n", "am\r\n", "Bob\r\n", "hellonfire\r\n",
  #     "How\r\n", "are you?\r\n"]]

请注意,"Ahem\r\n"不包括在内,因为它不跟"hello"并且"hellonfire\r\n"不会触发新切片,因为它与`&#34; hello&#34;'不匹配`。

<强>讨论

在该示例中,正则表达式计算为等于

/(?m-ix:\Ahello(?!\p{alpha}))/

它可以在自由间隔模式中定义,以使其自我记录。

/
\A             # match the beginning of the string
#{target}      # match target word
(?!\p{alpha})  # do not match a letter (negative lookbehind)
/x             # free-spacing regex definition mode