使用Python从xml文件中提取数据

时间:2018-01-16 21:25:23

标签: python xml soap

我正在尝试从文件中提取一些数据:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
    <d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2/2_0" modelBaseVersion="2">
        <exchange>
            <supplierIdentification>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </supplierIdentification>
        </exchange>
        <payloadPublication xsi:type="MeasuredDataPublication" lang="nl">
            <publicationTime>2014-12-04T06:59:55.000Z</publicationTime>
            <publicationCreator>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </publicationCreator>
            <measurementSiteTableReference id="NDW01_MT" version="662" targetClass="MeasurementSiteTable"/>
            <headerInformation>
                <confidentiality>noRestriction</confidentiality>
                <informationStatus>real</informationStatus>
            </headerInformation>
            <siteMeasurements>
                <measurementSiteReference id="GEO03_D4T-RWS_T_0317_ID_324" version="3" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1">
                    <measuredValue>
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfInputValuesUsed="100" standardDeviation="7">
                                <duration>34</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
            <siteMeasurements>
                <measurementSiteReference id="GEO01_Z_RWSTRN054" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0" numberOfInputValuesUsed="7" standardDeviation="0.71" supplierCalculatedDataQuality="100.0">
                                <duration>56</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
           .
           .
           .
           .
           .
           <siteMeasurements>
                <measurementSiteReference id="RWS01_MONIBAS_0091hrr0350ra0" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0">
                                <duration>23</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
        </payloadPublication>
    </d2LogicalModel>
</soap:Body>

我想要做的是使用Python从每个

中提取
             <siteMeasurements>
                <measurementSiteReference id="RWS01_MONIBAS_0091hrr0350ra0" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0">
                                <duration>23</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>

来自'measurementSiteReference'的属性'id'的值和'duration'的文本内容

我正在使用Python。我的代码到目前为止:

import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='track.xml')
root = tree.getroot()

for elem in tree.iter():
   print elem.tag, elem.attrib

但是我在提取这些值时遇到了困难。我对Python没有任何经验。

如何迭代'siteMeasurements'并获取measurementSiteTableReference的'id'属性值和'duration'的文本内容

请给我一些建议,帮助我上路

1 个答案:

答案 0 :(得分:1)

您可能在</soap:Envelope>文件底部缺少xml标记,或者您可能没有粘贴副本。 无论如何,在将标记放入并在顶部(第1行)添加以下xml标记后,我能够运行它。

<?xml version="1.0" encoding="UTF-8"?>

首先,我们需要弄清楚我们可以使用哪些元素。

>>> for i in root.iter():
    print i

其中列出如下(截断)

<Element '{http://schemas.xmlsoap.org/soap/envelope/}Envelope' at 0x29e4170>
<Element '{http://schemas.xmlsoap.org/soap/envelope/}Body' at 0x29e4190>
|
|
<Element '{http://datex2.eu/schema/2/2_0}measurementSiteTableReference' at 0x29e4510>
|
|
<Element '{http://datex2.eu/schema/2/2_0}duration' at 0x29e4750>

一旦我们拥有了这些元素,我们就会简单地通过所需的元素来获取它们的键/值对。

<强>代码

import xml.etree.ElementTree as ET
data_file = 'soapData2.xml'
tree = ET.parse(data_file)
root = tree.getroot()


t1 = "{http://datex2.eu/schema/2/2_0}measurementSiteReference"
t2 = "{http://datex2.eu/schema/2/2_0}duration"

print "measurementSiteReference ", ": duration"
for e1, e2 in zip(root.iter(t1), root.iter(t2)):
   print e1.attrib['id'] , ":", e2.text

<强>结果

>>> 
measurementSiteReference  : duration
GEO03_D4T-RWS_T_0317_ID_324 : 34
GEO01_Z_RWSTRN054 : 56
RWS01_MONIBAS_0091hrr0350ra0 : 23
>>>