解析非常大的XML文件

时间:2019-03-15 04:21:44

标签: xml xml-parsing nsxmlparser

我正在用Python解析1.3 GB的xml文件。以下是我的代码:

import xml.etree.ElementTree as etree

with open('SemCor+OMSTI/semcor+omsti.data.xml') as f:
    xml = f.read()

for event, elem in etree.iterparse(xml, events=('start', 'end', 'start-ns', 'end-ns')):  
  print(event, elem)

但是它给出的输出如下:

Traceback (most recent call last):
File "parse.py", line 8, in <module>
for event, elem in etree.iterparse(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>", events=('start', 'end', 'start-ns', 'end-ns')):
File "/home/himanshu/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1242, in iterparse
source = open(source, "rb")

之后是字符串格式的文件内容(不可解析)。我指的是this tutorial for parsing a very large xml file

文件输出太大,我看不到确切的错误。但是当我在打印数据并显示错误之前执行ctrl + C时,则会显示OSError。

0 个答案:

没有答案