ElementTree使用AND和'parent'搜索节点(XPATH)的更好方法

时间:2017-01-26 12:49:04

标签: python xml xpath elementtree

我需要找到符合2个条件的tag = ITEM,然后根据此查找获取父标记= NODE @ name。

两个问题:

  1. 我找不到让XPath做'和'的方法,例如

    item = node.findall('./ITEM[@name="toppas_type" and @value="output file list"]')
    
  2. 获取父NODE信息,而无需在找到ITEM之前进行明确搜索并保存,例如

    parent_name = item.parent.attrib['name']
    
  3. 这是我现在的代码:

    node_names = []
    for node in tree.findall('NODE[@name="vertices"]/NODE'): 
        for item in node.findall('./ITEM[@name="toppas_type"]'):
            if item.attrib['name'] == 'toppas_type' and item.attrib['value'] == 'output file list':
                node_names.append(node.attrib['name'])
    

    ...解析这样的文件(仅限代码段)......

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <PARAMETERS version="1.6.2" xsi:noNamespaceSchemaLocation="http://open-ms.sourceforge.net/schemas/Param_1_6_2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <NODE name="vertices" description="">   
            <NODE name="23" description="">
              <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
              <ITEM name="toppas_type" value="tool" type="string" description="" required="false" advanced="false" />
              <ITEM name="tool_name" value="FileConverter" type="string" description="" required="false" advanced="false" />
              <ITEM name="tool_type" value="" type="string" description="" required="false" advanced="false" />
              <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
              <ITEM name="y_pos" value="-1380" type="double" description="" required="false" advanced="false" />
            </NODE>
    
            <NODE name="24" description="">
              <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
              <ITEM name="toppas_type" value="output file list" type="string" description="" required="false" advanced="false" />
              <ITEM name="x_pos" value="-440" type="double" description="" required="false" advanced="false" />
              <ITEM name="y_pos" value="-1480" type="double" description="" required="false" advanced="false" />
              <ITEM name="output_folder_name" value="" type="string" description="" required="false" advanced="false" />
            </NODE>
    
            <NODE name="33" description="">
              <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
              <ITEM name="toppas_type" value="merger" type="string" description="" required="false" advanced="false" />
              <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
              <ITEM name="y_pos" value="-1540" type="double" description="" required="false" advanced="false" />
              <ITEM name="round_based" value="false" type="string" description="" required="false" advanced="false" />
            </NODE>
        <!--(snip)-->
        </NODE>
    </PARAMETERS>
    

    更新
    @MathiasMüller

    很好的建议 - 不幸的是,当我尝试加载XML文件时,我收到一个错误。我不熟悉lxml ...所以我不确定我是否正确使用它。

    from lxml import etree
    root = etree.DTD("/Users/mikes/Documents/Eclipseworkspace/Bioproximity/Assay-Workflows-Mikes/protein_lfq/protein_lfq-1.1.2.toppas")
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "src/lxml/dtd.pxi", line 294, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:187024)
    lxml.etree.DTDParseError: Content error in the external subset, line 2, column 1
    

    不幸的是,ElementTree不会在其tree.find(xpath)或tree.findall(xpath)中接受xpath

1 个答案:

答案 0 :(得分:1)

也许您根本不需要嵌套循环,单个XPath表达式就足够了。我不确定您希望最终结果是什么,但这是lxml的示例:

>>> import lxml.etree
>>> s = '''<NODE name="vertices" description="">
...
...     <NODE name="23" description="">
...       <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
...       <ITEM name="toppas_type" value="tool" type="string" description="" required="false" advanced="false" />
...       <ITEM name="tool_name" value="FileConverter" type="string" description="" required="false" advanced="false" />
...       <ITEM name="tool_type" value="" type="string" description="" required="false" advanced="false" />
...       <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
...       <ITEM name="y_pos" value="-1380" type="double" description="" required="false" advanced="false" />
...     </NODE>
...
...     <NODE name="24" description="">
...       <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
...       <ITEM name="toppas_type" value="output file list" type="string" description="" required="false" advanced="false" />
...       <ITEM name="x_pos" value="-440" type="double" description="" required="false" advanced="false" />
...       <ITEM name="y_pos" value="-1480" type="double" description="" required="false" advanced="false" />
...       <ITEM name="output_folder_name" value="" type="string" description="" required="false" advanced="false" />
...     </NODE>
...
...     <NODE name="33" description="">
...       <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
...       <ITEM name="toppas_type" value="merger" type="string" description="" required="false" advanced="false" />
...       <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
...       <ITEM name="y_pos" value="-1540" type="double" description="" required="false" advanced="false" />
...       <ITEM name="round_based" value="false" type="string" description="" required="false" advanced="false" />
...     </NODE>
... <!--(snip)-->
... </NODE>'''
>>> root = lxml.etree.fromstring(s)
>>> root.xpath('/NODE[@name="vertices"]/NODE/ITEM[@name = "toppas_type" and @value = "output file list"]')
[<Element ITEM at 0x102b5f788>]

如果您确实需要父元素的名称,则可以使用..移动到父节点:

>>> root.xpath('/NODE[@name="vertices"]/NODE/ITEM[@name = "toppas_type" and @value = "output file list"]/../@name')
['24']

从文件中解析XML文档

如果要从文件中解析XML文档,则函数etree.DTD是错误的选择。 DTD不是XML文档。以下是lxml

的方法
>>> import lxml.etree
>>> root = lxml.etree.parse("example.xml")
>>> root
<lxml.etree._ElementTree object at 0x106593b00>

第二次更新

如果最外面的元素是PARAMETERS,则需要像这样搜索:

>>> root.xpath('/PARAMETERS/NODE[@name="vertices"]/NODE/ITEM[@name = "toppas_type" and @value = "output file list"]')
[<Element ITEM at 0x106593e18>]