将元素添加到BeautifulSoup的find_all列表中作为字符串

时间:2015-06-04 15:36:34

标签: python windows python-2.7 web-scraping beautifulsoup

我正在使用BeautifulSoup的findall()函数测试webscraping概念。我试图获取具有class =' first'的p标签的内容。在div class ='晚餐'。

from bs4 import BeautifulSoup
import urllib2

html_doc="""
<html>
<head>
<title>The practice html document</title>
</head>
<body>
<div class='dinner'>
<p class='first'>I like pizza</p>
<p class='second'>I really like pizza</p>
<p class='first'>pizza is good</p> 
</div>
<div class='breakfast'>
<p class='first'>pancake</p>
</div>
<div class='lunch'>
<p> This is a paragraph</p>
</div>
</body>
</html>
"""
soup=BeautifulSoup(html_doc)
div_stuff=soup.find("div", attrs={'class':'dinner'})
print div_stuff
print '\n'
#This prints the paragraphs only in the div with the class dinner
div_paragraphs=unicode(div_stuff.find_all('p', attrs={'class':'first'}))
print div_paragraphs

findall函数将它找到的段落作为元素放入列表中。这是代码的输出:

<div class="dinner">
<p class="first">I like pizza</p>
<p class="second">I really like pizza</p>
<p class="first">pizza is good</p>
</div>

[<p class="first">I like pizza</p>, <p class="first">pizza is good</p>] 

目标是将段落的内容作为列表中的字符串。像这样:

[I like pizza,pizza is good]

我可以创建一些代码来遍历每个元素并在找到所有实例后替换它们,但我想看看是否有一种方法可以在findall将每个元素存储到列表之前使它们成为字符串。

1 个答案:

答案 0 :(得分:4)

.findall()将返回匹配项;你正在寻找元素,而不是所包含的文本(这将是一个非常不同的搜索)。

您可以轻松地在列表理解中提取文本:

[elem.get_text() for elem in soup.select('div.dinner p.first')]

我在这里使用了CSS selector来匹配p父母上下文中的div代码。

演示:

>>> from bs4 import BeautifulSoup
>>> html_doc="""
... <html>
... <head>
... <title>The practice html document</title>
... </head>
... <body>
... <div class='dinner'>
... <p class='first'>I like pizza</p>
... <p class='second'>I really like pizza</p>
... <p class='first'>pizza is good</p> 
... </div>
... <div class='breakfast'>
... <p class='first'>pancake</p>
... </div>
... <div class='lunch'>
... <p> This is a paragraph</p>
... </div>
... </body>
... </html>
... """
>>> soup = BeautifulSoup(html_doc)
>>> [elem.get_text() for elem in soup.select('div.dinner p.first')]
[u'I like pizza', u'pizza is good']