使用lxml模块解析xml文件时出现问题

时间:2014-12-09 13:22:14

标签: python xml parsing lxml

我试图迭代所有"值" "变体"的标签,代码不会跳转到下一个"值"密钥,因为xml有另一个"值" " FIRST VALUE KEY"

下的键
<variant>
  <name>PROGRAMS</name>
  <value>  <!-- Lets call it FIRST VALUE KEY -->
     <value>PROG1</value>
     <statistics>
        <statistic name="Stats">
           <value>5</value>
        </statistic>
     </statistics>
  </value>
  <value>  <!-- SECOND VALUE KEY -->
     <value>PROG2</value>
     ...
  </value>
</variant>
<variant>
  <name>OTHER</name>
   ...
</variant>

这是我的python代码

for keys in root.iter('variant'):
    for variant in keys:
        if variant.text == 'PROGRAMS':
            for value_tag in keys.iter('value'):
                ParamValue = value_tag.find('value').text
                    if ParamValue == 'PROG2':
                        print "GOT IT!"
                    else: continue # <- this jumps to the "<value>PROG1</value>" tag
                                   # but it should jump to the "SECOND VALUE KEY"

问题出在哪里?

1 个答案:

答案 0 :(得分:1)

import lxml.etree as ET
root = ET.parse('data').getroot()

for value in root.xpath(
    '''//variant
           [name  
             [text()="PROGRAMS"]]
         /value
           [value
             [text()="PROG2"]]'''):
    print('GOT IT')

产量

GOT IT

我认为use XPath更容易深入挖掘你想要的元素。 XPath意味着

//                         # look for all elements
variant                    # that are variants
   [name                   # that have a <name> element
     [text()="PROGRAMS"]]  # with text equal to "PROGRAMS" 
 /value                    # select the <value> (child of variant)
   [value                  # that has a child <value> element
     [text()="PROG2"]]     # with text equal to "PROG2"

迭代<statistics>元素的<value>个孩子:

for statistics in root.xpath(
    '''//variant
           [name  
             [text()="PROGRAMS"]]
         /value
           [value
             [text()="PROG2"]]
          /statistics'''):

在XPath中,括号[..]松散地转换为&#34;这样&#34;。请注意,如果没有括号,上面的XPath将为//variant/value/statistics。它看起来有点像文件路径。和文件路径一样,它显示了元素的谱系。一个/表示&#34;指导&#34;的孩子,而//表示&#34;后代&#34; (例如,孩子,孙子或孙子等)。