如何根据属性从XML获取后代节点

时间:2016-06-26 00:30:17

标签: ruby xml xpath nokogiri

我试图让一个节点的后代子女:

require 'nokogiri'

@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
nom_id = @doc.xpath('//race/nomination/@id')

race_id.each do |x|
  puts race_id.traverse {|race_id| puts nom_id }
end

我正在查看两个信息来源:

  1. XML:Node的文档,其中包含

    Nokogiri::XML::Node#children
    
  2. sparklemotion' Cheat-sheet

    node.traverse {|node| } # yields all children and self to a block, _recursivel
    
  3. 这是我的测试XML:

    <meeting id="42977">
      <race id="215411">
        <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
        <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
        <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
      </race>
      <race id="215412">
        <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
        <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
        <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
      </race>
    </meeting>
    

    我可以使用XPath轻松获得竞赛id

    require 'nokogiri'                                                                                                                      
    
      @doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))                                                                               
    
      race_id = @doc.xpath('//race/@id')                                                                                                      
      nom_id = @doc.xpath('//race/nomination/@id')  
    
      ...
      215411
      215412
    

    如何获取节点提名ID和race_id 215411的数量并将其存储到哈希值(如下所示)?

    {215411 => [{id:198926, number:8},{id:198965, number:2}]}
    

2 个答案:

答案 0 :(得分:1)

require 'nokogiri'

# xml data
str =<<-EOS
<meeting id="42977">
  <race id="215411">
    <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
    <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
    <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
  </race>
  <race id="215412">
    <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
    <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
    <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
  </race>
</meeting>
EOS

# create doc
doc = Nokogiri::XML(str)

# clean; via http://stackoverflow.com/a/1528247
doc.xpath('//text()[not(normalize-space())]').remove

# parse doc
parsed_doc = doc.xpath('//race').inject({}) {|h,x| h[x.get_attribute('id').to_i] = x.children.map {|y| {id: y.get_attribute('id').to_i, number: y.get_attribute('number').to_i}}; h}
# {215411=>
#  [{:id=>198926, :number=>8},
#   {:id=>198965, :number=>2},
#   {:id=>199260, :number=>1}],
# 215412=>
#  [{:id=>199634, :number=>1},
#   {:id=>208926, :number=>2},
#   {:id=>122923, :number=>3}]}

# select via id
parsed_doc.select {|k,v| k == 215411}
# {215411=>
#  [{:id=>198926, :number=>8},
#   {:id=>198965, :number=>2},
#   {:id=>199260, :number=>1}]}

这是单线作为多线程:

parsed_doc = doc.xpath('//race').inject({}) do |h,x|
  h[x.get_attribute('id').to_i] = x.children.map do |y|
    {
      id: y.get_attribute('id').to_i,
      number: y.get_attribute('number').to_i
    }
  end
  h
end

答案 1 :(得分:1)

我做的事情如下:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<meeting id="42977">
  <race id="215411">
    <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
    <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
    <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
  </race>
  <race id="215412">
    <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
    <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
    <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
  </race>
</meeting>
EOT

race_id = 215411
nominations = doc.at("race[id='#{race_id}']") 
   .search('nomination')
   .map{ |nomination|
     {
      number: nomination['number'].to_i,
      id: nomination['id'].to_i
     }
   }

{race_id => nominations}
# => {215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}

race[id='#{race_id}']正在构建一个CSS选择器,以便只找到所需的节点。然后,很容易找到所需的nomination节点。

注意,我不会使用childrentraverse,因为他们会返回所有节点,包括文本节点,而不仅仅是元素节点。我必须使用额外的逻辑来忽略文本节点,这会浪费时间和空间。

您的问题并不清楚,但如果您想要返回所有比赛的信息,那么这只是一个简单的调整:

doc.search('race').map{ |race|
  nominations = race.search('nomination')
     .map{ |nomination|
       {
        number: nomination['number'].to_i,
        id: nomination['id'].to_i
       }
     }

  {race['id'].to_i => nominations}
}
# => [{215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}, {215412=>[{:number=>1, :id=>199634}, {:number=>2, :id=>208926}, {:number=>3, :id=>122923}]}]