Question

我有以下问题要回答。我正在执行所有这些步骤，但得到的答案为1568或1572。但是显然，这些答案都是错误的。有人可以帮助我了解我在这里做错什么。

从链接“ https://en.wikipedia.org/wiki/Python_(programming_language)”中读取html内容。将内容存储在变量html_content中。

使用html_content和html.parser创建BeautifulSoup对象。将结果存储在可变汤中。

查找汤对象中存在的参考链接的数量。将结果存储在变量n_links中。

提示：利用find_all方法和一个标记。

打印n_links。

Answer 1

这里可能存在语义上的问题。不确定，因为您未指定答案的实际目标号码。如果所需的链接来自references部分，则需要使用父类将其限制为html的该部分。在这种情况下，我将使用通过select应用的css选择器。得到391。

from bs4 import BeautifulSoup as bs
import requests

html_content = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)#References').content
soup = bs(html_content, 'html.parser')
n_links = [item['href'] for item in soup.select('.reflist a')]
print(len(n_links))

Answer 2

from urllib import request
import re

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
html_content = request.urlopen(url).read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

n_links = []

for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    n_links.append(link.get('href'))

print(n_links)

查找网页上存在的参考链接的数量

2 个答案: