xpath没有找到它应该的元素

时间:2016-11-19 00:04:52

标签: python python-2.7 xpath lxml

我有这个xml-excerpt:

<?xml version="1.0" encoding="utf-8"?>
      <parts>
        <part name="phrase" type="text">guidebook</part>
        <part name="phrase" type="part_of_speech">n</part>
        <part name="definition" type="text">something that offers basic information or instructions</part>
        <part name="definition" type="sfx_id">q</part>
        <part id="1" name="example" type="text">Don't forget your travel [guidebook] when you pack the suitcase.</part>
        <part id="1" name="example" type="sfx_id">c</part>
        <part id="2" name="example" type="text">I never use [guidebooks] when visiting a place.</part>
        <part id="2" name="example" type="sfx_id">d</part>
        <part name="phrase" type="sfx_id">a</part>
      </parts>

我尝试提取一些内容并使用以下代码:

from lxml.etree import parse

tree = parse('test.xml')

word = tree.xpath('//part[@name="phrase"][@type="text"]')[0].text
cat = tree.xpath('//part[@name="phrase"][@type="part_of_speech"]')[0].text
defi = tree.xpath('//part[@name="definition"][@type="text"]')[0].text
ex1 = tree.xpath('//part[@id="1"][@name="example"][@type="text"]')[0].text
ex2 = tree.xpath('//part[@id="2"][@name="example"][@type="text"]')[0].text

print word, cat, defi, ex1, ex2

到目前为止它的确有效。但是当我尝试使用真正的xml文件时,它再也无法工作了。一旦添加到xml文件中,脚本就会崩溃。

Traceback (most recent call last):   File "testextract.py", line 5, in <module>
word = tree.xpath('//part[@name="phrase"][@type="text"]')[0].text IndexError: list index out of range

这里是完整的xml文件:

<?xml version="1.0" encoding="utf-8"?>
<item xmlns="http://www.supermemo.net/2006/smux">
  <lesson-title>Communication and information</lesson-title>
  <chapter-title>Say the word</chapter-title>
  <question-title><text autoshow="true"><sentence>Say the word that matches the definition.</sentence><translation lang="pl">Powiedz słowo, które odpowiada definicji.</translation><translation lang="en">Say the word that matches the definition.</translation></text></question-title>
  <question><gfx file="b" scale-base="1024" float="right"/>

<text autoshow="true"><sentence></sentence><translation lang="pl"><big>przewodnik (książka)</big><br /><br /></translation></text>
<em>(n)</em> <strong>something that offers basic information or instructions</strong><br/>

<p class="ex"><text><sentence>(e.g. Don't forget your travel ....... when you pack the suitcase.)</sentence><translation lang="pl">Nie zapomnij przewodnika, gdy będziesz pakować walizkę.</translation></text></p>
</question>
  <answer><text><sentence><strong>guidebook</strong></sentence><translation lang="pl">przewodnik (książka)</translation></text> <em>n</em><br/>

<br/>
<small>Examples:</small>
<p class="ex"><sfx file="c" inline="true"/>&#160; <text><sentence>Don't forget your travel guidebook when you pack the suitcase.</sentence><translation lang="pl">Nie zapomnij przewodnika, gdy będziesz pakować walizkę.</translation></text></p>
<p class="ex"><sfx file="d" inline="true"/>&#160; <text><sentence>I never use guidebooks when visiting a place.</sentence><translation lang="pl">Nigdy nie korzystam z przewodników, gdy zwiedzam jakieś miejsce.</translation></text></p>
</answer>
  <modified>2015-10-15</modified>
  <template-id>12</template-id>
  <question-audio>true</question-audio>
  <answer-audio>true</answer-audio>
  <gfx-1 id="86" group-id="2" />
  <parts>
    <part name="phrase" type="text">guidebook</part>
    <part name="phrase" type="translation_pl">przewodnik (książka)</part>
    <part name="phrase" type="part_of_speech">n</part>
    <part name="definition" type="text">something that offers basic information or instructions</part>
    <part name="gfx" type="gfx_id">b</part>
    <part name="definition" type="sfx_id">q</part>
    <part id="1" name="example" type="text">Don't forget your travel [guidebook] when you pack the suitcase.</part>
    <part id="1" name="example" type="translation_pl">Nie zapomnij [przewodnika], gdy będziesz pakować walizkę.</part>
    <part id="1" name="example" type="sfx_id">c</part>
    <part id="2" name="example" type="text">I never use [guidebooks] when visiting a place.</part>
    <part id="2" name="example" type="translation_pl">Nigdy nie korzystam z [przewodników], gdy zwiedzam jakieś miejsce.</part>
    <part id="2" name="example" type="sfx_id">d</part>
    <part name="phrase" type="sfx_id">a</part>
  </parts>
</item>

任何人都可以向我解释发生了什么事吗? (也许如何修复它?这将是非常好的:-))

SOLUTION:

from lxml.etree import parse, tostring
from lxml import etree

tree = parse('test2.xml')

stringtree = tostring(tree) 
root = etree.fromstring(stringtree)
ns = {"d" : "http://www.supermemo.net/2006/smux"}

wort = root.xpath('//d:part[@name="phrase"][@type="text"]', namespaces=ns)[0]
print wort.text
kategorie = root.xpath('//d:part[@name="phrase"][@type="part_of_speech"]', namespaces=ns)[0]
print kategorie.text
definition = root.xpath('//d:part[@name="definition"][@type="text"]', namespaces=ns)[0]
print definition.text
ex1 = root.xpath('//d:part[@id="1"][@name="example"][@type="text"]', namespaces=ns)[0]
print ex1.text
ex2 = root.xpath('//d:part[@id="2"][@name="example"][@type="text"]', namespaces=ns)[0]
print ex2.text

谢谢大家:-D

0 个答案:

没有答案
相关问题