从beautifulsoup中提取文本

时间:2018-02-11 15:50:15

标签: python beautifulsoup

我正在尝试解析一些LinkedIn数据,我想在for循环中获取此跨度内的文本。所以下面会返回一个字符串=“2个共享连接”

<span class="search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1">
      2 shared connections
    </span>

这是xpath:

//*[@id="ember4490"]/span

到目前为止,我可以使用此代码正确选择范围:

mutual_conns_with_text = div.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'})

但是,上面选择整个范围不是只是文本。以下代码抛出了异常:

mutual_conns_with_text = div.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'}).getText()

例外:

AttributeError: 'NoneType' object has no attribute 'getText'

1 个答案:

答案 0 :(得分:1)

您可以简单地询问text元素的span属性:

>>> import bs4
>>> HTML = '''\
... <span class="search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1">
...     2 shared connection
... </span>'''
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> mutual_conns_with_text = soup.find('span', {'class': 'search-result__social-proof-count Sans-13px-black-55%-semibold text-align-left ml1'})
>>> mutual_conns_with_text.text
'\n\t2 shared connection\n'