Question

我在使用Python（2.7）时遇到问题。代码基本上包括：

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)

for x in z.findAll('el'):
    # if 'at' in x:
    # if hasattr(x, 'at'):
        print x['at']   
    else:
        print 'nothing'

我希望第一个if语句正常工作（即：如果at不存在，请打印"nothing"），但它总是不打印（即：始终为{{ 1}}）。另一方面，第二个False始终为if，这会导致代码在尝试从第二个True元素访问KeyError时引发at ，当然不存在。

Answer 1

in运算符用于序列和映射类型，是什么让你认为BeautifulSoup返回的对象应该正确实现它？根据BeautifulSoup文档，您应该使用[]语法访问属性。

Re hasattr，我认为你混淆了HTML / XML属性和Python对象属性。 hasattr用于后者，BeaitufulSoup AFAIK不反映它在自己的对象属性中解析的HTML / XML属性。

P.S。请注意Tag 中的BeautifulSoup对象实现__contains__ - 所以也许您正在尝试使用错误的对象？你能展示一个完整但最小的例子来证明这个问题吗？

运行此：

from BeautifulSoup import BeautifulSoup

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)

for x in z.findAll('el'):
    print type(x)
    print x['at']

我明白了：

<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
  File "soup4.py", line 8, in <module>
    print x['at']
  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'at'

这是我的预期。第一个el有一个at属性，第二个没有 - 这会抛出一个KeyError。

更新2：BeautifulSoup.Tag.__contains__查看标记的内容，而不是其属性。要检查属性是否存在，请使用in。

Answer 2

如果您的代码与您提供的一样简单，您可以通过以下方式以紧凑的方式解决：

for x in z.findAll('el'):
    print x.get('at', 'nothing')

Answer 3

要按标签名称扫描元素，pyparsing解决方案可能更具可读性（并且不使用已弃用的API，如has_key）：

from pyparsing import makeXMLTags

# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags('el')

# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
    if elTag.at:
        print elTag.at

对于一个额外的细化，pyparsing允许您定义过滤解析操作，以便只有在找到特定的属性值（或attribute-anyvalue）时才能匹配标记：

# import parse action that will filter by attribute
from pyparsing import withAttribute

# only match el tags having the 'at' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))

# now loop again, but no need to test for presence of 'at'
# attribute - there will be no match if 'at' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
    print elTag.at

Answer 4

我通常使用get（）方法来访问属性

link = soup.find('a')
href = link.get('href')
name = link.get('name')

if name:
    print 'anchor'
if href:
    print 'link'

访问BeautifulSoup中的属性时出现问题

4 个答案: