如何使用像BeautifulSoup这样的lxml搜索etree

时间:2016-07-18 08:41:43

标签: python python-2.7 beautifulsoup lxml bs4

假设我有以下xml:

<?xml version="1.0" encoding="utf-8"?>
<FeedType xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="https://foo.com/bar" xsi:schemaLocation="https://foo.com/bar https://foo.com/bar/arr.xsd" value="Type">
    <ElementName value='Type'>
        <DataIWant>
            stuff
        </DataIWant>
        <DataIWant>
            other stuff
        </DataIWant>
    </ElementName>
</FeedType>

我希望获得ElementName标记中的所有内容。

在Beautifulsoup,可以打电话

soup.find_all('ElementName')

哪个会返回以ElementName为根的树。

我怎样才能在lxml中执行此操作?

1 个答案:

答案 0 :(得分:-1)

File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/core/handlers/base.py" in get_response 132. response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/home/iago/Escritorio/tfm/website/apacheStratos/views.py" in simpleDeploy 61. if form.is_valid(): File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/forms.py" in is_valid 184. return self.is_bound and not self.errors File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/forms.py" in errors 176. self.full_clean() File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/forms.py" in full_clean 392. self._clean_fields() File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/forms.py" in _clean_fields 407. value = field.clean(value) File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/fields.py" in clean 163. self.validate(value) File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/fields.py" in validate 868. if value and not self.valid_value(value): File "/usr/local/lib/python2.7/dist-packages/Django-1.8.12-py2.7.egg/django/forms/fields.py" in valid_value 878. for k, v in self.choices: Exception Type: ValueError at /stratos/simpleDeploy Exception Value: need more than 1 value to unpack findall method,可以使用。

但是,XML文档包含默认命名空间,因此搜索普通lxml标记将无法找到它 - 您需要指定命名空间:

ElementName

如果您不想指定命名空间,可以使用忽略命名空间的XPath查询,只查找“本地名称”为root.findall('foobar:ElementName', namespaces = {'foobar': 'https://foo.com/bar'}) 的元素:

ElementName