Question

我有一个文件，其内容如下：

this is test line 1 
this is testing purpose 
<public>
am inside of public
doing lot of stuffs and priting result here
</public>
<public>
am inside of another public
doing another set of stuffs and priting here
</public>

我想将此文件拆分为三个不同的部分：

不在任何部分内的行
第一部分内部的行
第二部分内部的行

我尝试使用take_while和drop_while，

File.open(filename).each_line.take_while do |l|
  !l.include?('</public>')
end.drop_while do |l|
  !l.include?('<public>')
end.drop(1))

但它仅提取第一个<public> ... </public>部分。

在某些情况下，订单可能会发生变化，例如公共部分将首先出现，其余内容将在最后或中间出现。如果内容顺序与上面的模板相同，那么我可以按照下面的方法

File.read(filename).split(/<\/?public>/)
                   .map(&:strip)
                   .reject(&:empty?)

我从Split lines using tags that appear multiple times in file得到答案。

但是看一些通用方法，无论如何我都可以处理数据。

我正在寻找更好的解决方案。任何建议都将不胜感激。

Answer 1

考虑一下：

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<root>
this is test line 1 
<public>
am inside of public
</public>
<public>
am inside of another public
</public>
</root>
EOT

text_inside_public_tags = doc.search('public').map(&:text)
# => ["\n" +
#    "am inside of public\n", "\n" +
#    "am inside of another public\n"]

doc.search('public').each(&:remove)

text_outside_public_tags = doc.at('root').text
# => "\n" +
#    "this is test line 1 \n" +
#    "\n" +
#    "\n"

Answer 2

您可以在此处使用Ruby flip-flop operator。

<强>代码

def dissect(str)
  arr = str.lines.map(&:strip)
  grp, ungrp = [], []
  arr.each { |line| line=='<public>'..line=='</public>' ? (grp << line) : ungrp << line }
  [grp.slice_when { |s,t| s == '</public>' && t == '<public>' }.
       map { |a| a[1..-2] },
   ungrp]
end

该方法的最后一个语句构造了该方法返回的数组，可以替换为以下语句。

b = grp.count('<public>').times.with_object([]) do |_,a|
  ndx = grp.index('</public>')
  a << grp[1..ndx-1]
  grp = grp[ndx+1..-1] if ndx < grp.size-1
end
[b, ungrp]

示例

str =<<-EOS this is test line 1 this is testing purpose <public> am inside of public doing lot of stuffs and printing result here </public> let's stick another line here <public> am inside of another public doing another set of stuffs and printing here </public> and another line here EOS

grouped, ungrouped = dissect(str) #=> [ # [ ["am inside of public", # "doing lot of stuffs and printing result here"], # ["am inside of another public", # "doing another set of stuffs and printing here"] # ], # [ # "this is test line 1", # "this is testing purpose", # "let's stick another line here", # "and another line here"] # ] # ] grouped #=> [ ["am inside of public", # "doing lot of stuffs and printing result here"], # ["am inside of another public", # "doing another set of stuffs and printing here"] # ] ungrouped #=> ["this is test line 1", # "this is testing purpose", # "let's stick another line here", # "and another line here"]

<强>解释

对于上面的例子，步骤如下。

arr = str.lines.map(&:strip) #=> ["this is test line 1", "this is testing purpose", "<public>", # "am inside of public", "doing lot of stuffs and printing result here", # "</public>", "let's stick another line here", "<public>", # "am inside of another public", "doing another set of stuffs and printing here", # "</public>", "and another line here"] ungrp, grp = [], [] arr.each { |line| line=='<public>'..line=='</public>' ? (grp << line) : ungrp << line }

触发器返回false，直到line=='<public>'为true。然后它返回true并继续返回true，直到 line=='</public>'之后的为true。然后它返回false，直到它再次遇到line=='<public>'为true的行，依此类推。

ungrp #=> <returns the value of 'ungrouped' in the example> grp #=> ["<public>", # "am inside of public", # "doing lot of stuffs and printing result here", # "</public>", # "<public>", # "am inside of another public", # "doing another set of stuffs and printing here", # "</public>"] enum = grp.slice_when { |s,t| s == '</public>' && t == '<public>' } #=> #<Enumerator: #<Enumerator::Generator:0x00000

见Enumerable#slice_when，它在Ruby v2.2中首次亮相。

我们可以看到这个枚举器通过将它转换为数组而生成的元素。

enum.to_a #=> [["<public>", "am inside of public", # "doing lot of stuffs and printing result here", "</public>"], # ["<public>", "am inside of another public", # "doing another set of stuffs and printing here", "</public>"]]

最后，

enum.map { |a| a[1..-2] } #=> <returns the array 'grouped' in the example>

ruby使用选择性标记分割行，这些标记在文件中以任何顺序出现多次

2 个答案: