Question

我正在尝试了解beautifulsoup如何工作以创建应用程序。

我能够使用.find_all（）查找和打印所有元素，但是他们也会打印html标签。如何仅打印这些标签中的文本。

这就是我所拥有的：

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
print i

Answer 1

这可能会对您有所帮助： -

from bs4 import BeautifulSoup
source_code = """<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""
soup = BeautifulSoup(source_code)
print soup.text

输出： -

1
2
3

Answer 2

我认为你可以在this stackoverflow question做他们所做的事情。使用findAll(text=True)。所以在你的代码中：

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.findAll(text=True)
print i

Answer 3

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
for p in i:
    print p.text

find_all()会返回一个标记列表，您应该迭代它并使用tag.text获取标记下的文字

更好的方式：

for p in soup.find_all('p'):
    print p.text

如何只打印文字beautifulsoup

3 个答案: