如何使用lxml.etree在XML中找到相同的属性值?

时间:2018-08-09 06:25:28

标签: python xml lxml

我的XML类似于下面的输入。

这里我想汇总Sales燃料类型的Diesel值。

我如何迭代所有<Tank>元素并读取fuelItem属性以查找多个相同燃料类型的事件,然后将Sales属性值求和?

输入:

 <EnterpriseDocument>
      <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>

首选输出:

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" netSalesQty="1000" />
    <Tank  fuelItem="Diesel" netSalesQty="5000" />
  </FuelTankList>
</EnterpriseDocument>

3 个答案:

答案 0 :(得分:2)

由于您使用的是lxml,因此可以使用XSLT和Muenchian Grouping通过其Tank属性对fuelItem元素进行分组。

示例...

XML输入(input.xml)

<EnterpriseDocument>
    <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
    </FuelTankList>
</EnterpriseDocument>

XSLT 1.0 (test.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="tanks" match="Tank" use="@fuelItem"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="FuelTankList">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="Sales">
            <xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
          </xsl:attribute>
        </xsl:copy>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python

from lxml import etree

tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")

new_tree = tree.xslt(xslt)

print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))

输出(标准输出)

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" Sales="1000"/>
    <Tank fuelItem="Diesel" Sales="5000"/>
  </FuelTankList>
</EnterpriseDocument>

答案 1 :(得分:1)

希望这会有所帮助。它遍历每个加油站列表,从中获取油箱列表,检索其值并删除它们。一旦有了这些值并对其进行了操作,便将带有过程值的新油箱添加到燃料箱列表中。

import lxml.etree as le

xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>"""

root = le.fromstring(xml)

#get all the fueltanklists from the file

fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
    tankdict={}
    #get all the tanks in the current fueltanklist

    tanks = fuellist.xpath('child::Tank')
    for tank in tanks:
        fuelitem = tank.attrib['fuelItem']
        sales = tank.attrib['Sales']
        if fuelitem in tankdict:
            tankdict[fuelitem] += int(sales)
        else:
            tankdict[fuelitem] = int(sales)

        #Once we have retrieved the value of the current tank, delete it from its parent

        tank.getparent().remove(tank)
    for key, value in tankdict.items():
        #Create and add tanks with new values to its parent
        newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
        fuellist.append(newtank)

#Store the entire xml in a new string

newxml = le.tostring(root)

答案 2 :(得分:1)

尝试一下:

ngIf

leaflet的输出:

from lxml import etree

# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))

# Collect Tank element attributes here.
tanks = {}

# The FuelTankList element whose children we will change.
fuel_tank_list = None

# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
    # Get attributes.
    fuel_item = tank.get("fuelItem")
    sales = tank.get("Sales")

    # Add to sales sum.
    existing_sales = tanks.get(fuel_item, 0)
    tanks[fuel_item] = existing_sales + int(sales)

    # Remove <Tank>
    fuel_tank_list = tank.getparent()
    fuel_tank_list.remove(tank)

# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
    new_tank = etree.Element("Tank")
    new_tank.attrib["fuelItem"] = fuel_item
    new_tank.attrib["Sales"] = str(sales)
    fuel_tank_list.append(new_tank)

# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
    f.write(etree.tostring(tree, pretty_print=True))