如何使用XPath和Ruby获取XML中的绝对节点路径?

时间:2010-12-30 00:39:19

标签: ruby xml rexml

基本上我想提取从节点到root的绝对路径,并将其报告给控制台或文件。以下是当前的解决方案:

require "rexml/document"

include REXML

def get_path(xml_doc, key)
  XPath.each(xml_doc, key) do |node|
    puts "\"#{node}\""
    XPath.each(node, '(ancestor::#node)') do |el|
      #  puts  el
    end
  end
end

test_doc = Document.new <<EOF
  <root>
   <level1 key="1" value="B">
     <level2 key="12" value="B" />
     <level2 key="13" value="B" />
   </level1>
  </root>
EOF

get_path test_doc, "//*/[@key='12']"

问题是它给了我"<level2 value='B' key='12'/>"作为输出。期望的输出是<root><level1><level2 value='B' key='12'/>(格式可能不同,主要目标是拥有完整路径)。我只有XPath的基本知识,并希望得到任何帮助/指导,以及如何实现这一目标。

3 个答案:

答案 0 :(得分:4)

这应该让你开始:

require 'nokogiri'

test_doc = Nokogiri::XML <<EOF
  <root>
   <level1 key="1" value="B">
     <level2 key="12" value="B" />
     <level2 key="13" value="B" />
   </level1>
  </root>
EOF

node = test_doc.at('//level2')
puts [*node.ancestors.reverse, node][1..-1].map{ |n| "<#{ n.name }>" }
# >> <root>
# >> <level1>
# >> <level2>

Nokogiri非常好,因为它可以让你使用CSS访问器而不是XPath,如果你愿意的话。 CSS对某些人来说更直观,并且比同等的XPath更清晰:

node = test_doc.at('level2')
puts [*node.ancestors.reverse, node][1..-1].map{ |n| "<#{ n.name }>" }
# >> <root>
# >> <level1>
# >> <level2>

答案 1 :(得分:3)

首先,请注意您的文档不是我想要的。我怀疑您不希望<level1>自我关闭,而是将<level2>元素包含为子项。

其次,我更喜欢并提倡Nokogiri而不是REXML。很高兴REXML附带Ruby,但Nokogiri更快更方便,恕我直言。所以:

require 'nokogiri'

test_doc = Nokogiri::XML <<EOF
  <root>
    <level1 key="1" value="B">
      <level2 key="12" value="B" />
      <level2 key="13" value="B" />
    </level1>
  </root>
EOF

def get_path(xml_doc, key)
  xml_doc.at_xpath(key).ancestors.reverse
end

path = get_path( test_doc, "//*[@key='12']" )
p path.map{ |node| node.name }.join( '/' )
#=> "document/root/level1"

答案 2 :(得分:2)

如果您设置了REXML,这是一个REXML解决方案:

require 'rexml/document'

test_doc = REXML::Document.new <<EOF
  <root>
    <level1 key="1" value="B">
      <level2 key="12" value="B" />
      <level2 key="13" value="B" />
    </level1>
  </root>
EOF

def get_path(xml_doc, key)
  node = REXML::XPath.first( xml_doc, key )
  path = []
  while node.parent
    path << node
    node = node.parent
  end
  path.reverse
end

path = get_path( test_doc, "//*[@key='12']" )
p path.map{ |el| el.name }.join("/")
#=> "root/level1/level2"

或者,如果您想使用其他答案中的相同get_path实现,您可以monkeypatch REXML添加ancestors方法:

class REXML::Child
  def ancestors
    ancestors = []

    # Presumably you don't want the node included in its list of ancestors
    # If you do, change the following line to    node = self
    node = self.parent

    # Presumably you want to stop at the root node, and not its owning document
    # If you want the document included in the ancestors, change the following
    # line to just    while node
    while node.parent
      ancestors << node
      node = node.parent
    end

    ancestors.reverse
  end
end