删除xml文件的第一部分,无法序列化

时间:2019-06-15 15:54:50

标签: python django xml parsing pkcs#7

我有一个像这样的xml文件:

'''some non ascii character'''
<b:FatturaElettronica xmlns:b="#">
  <FatturaElettronicaHeader>
    <DatiTrasmissione>
      <IdTrasmittente>
        <IdPaese>IT</IdPaese>

我需要删除所有内容,直到

<FatturaElettronicaHeader>

现在的代码是:

import xml.etree.ElementTree as ET
import xml.etree.ElementTree as ETree
from lxml import etree

parser = etree.XMLParser(encoding='utf-8', recover=True, remove_comments=True, resolve_entities=False)
tree = ETree.parse('test.xml', parser)

root = tree.getroot()

print etree.tostring(root)

给我:

Traceback (most recent call last):
  File "xml2.py", line 14, in <module>
    print etree.tostring(root)
  File "src/lxml/etree.pyx", line 3350, in lxml.etree.tostring
TypeError: Type 'NoneType' cannot be serialized.

淘汰xml文件的第一部分。

TY

1 个答案:

答案 0 :(得分:0)

您可以使用 find()函数搜索第一个括号。

if ( simulator.availability !== '(available)' && simulator.isAvailable !== true ) { continue; }

但是您的xml文件也必须正确:

import xml.etree.ElementTree as ET

with open ('...XMLFILE.xml', 'r') as file:
    filestring = file.read()

XML_start = filestring.find('<')
print(XML_start) #gives 31

tree = ET.fromstring(filestring[XML_start:])

for i in tree.iter():
    print(i.tag) #gives {#}FatturaElettronica, FatturaElettronicaHeader, ...