How to get owner element of a text node?

时间:2015-07-31 20:28:06

标签: python xpath lxml

I have this data:

<data>
  <foo>foo text</foo>
  data text
    <bar>
      bar text
      <baz>text</baz>
      <baz>text</baz>
      bar text
    </bar>
   data text
</data>

and I need get all text values in order, modify text inside "baz" tag and print. My code is:

text = []
for element in etree.xpath("./*"):
    text.extend(element.xpath("./text()"))
    if element.tag == 'bar':
        text.extend(["baz " + s for s in element.xpath("./baz/text()")])
print '\n'.join([s.strip() for s in text if s.strip()])

output is:

foo text
bar text
bar text
baz text
baz text

but I need:

foo text
data text
bar text
baz text
baz text
bar text
data text

How can I get text() of node in order and without lost data text text?

Edit I know about etree.xpath(".//text()") which can give me all text in order, but I need to modify text inside baz tag. This is a point. How can I get tag value of every element of .//text() XPath?

1 个答案:

答案 0 :(得分:1)

假设您正在使用lxml,您可以调用getparent()函数来获取文本节点的所有者元素,例如:

import lxml.etree
etree = lxml.etree.fromstring('''
<data>
  <foo>foo text</foo>
  data text
    <bar>
      bar text
      <baz>text</baz>
      <baz>text</baz>
      bar text
    </bar>
   data text
</data>
''')

for text in etree.xpath("//text()[normalize-space()]"):
    parenttag = text.getparent().tag
    print(parenttag, text)

XPath表达式//text()[normalize-space()]仅表示返回XML文档中的所有非空文本节点。

输出

('foo', 'foo text')
('foo', '\n  data text\n    ')
('bar', '\n      bar text\n      ')
('baz', 'text')
('baz', 'text')
('baz', '\n      bar text\n    ')
('bar', '\n   data text\n')