Question

我正在使用BeautifulSoup进行一些抓取练习，但是我生成了一个似乎在循环中的事件。

这是我的代码：

from bs4 import BeautifulSoup
import requests

# Print all links in the page

linkpage = "https://automatetheboringstuff.com/chapter12/"
page = requests.get(linkpage)
page.econding = "utf-8"
data = page.text
html = BeautifulSoup(data, "html5lib")

for link in html.find_all("a"):
    print(link)

当我执行此脚本时，CPU达到最大值，则不打印任何内容，执行循环进行。为什么？

两个重要的考虑因素：

这仅在Linux（Python 2和Python 3）下发生。我在Windows下没有得到相同的行为：效果很好，所有链接都正确打印了！：‑ |
仅在变量链接页中指示的URL会发生这种情况。当我与其他人（即https://stackoverflow.com/）进行更改时，它可以正常工作。

编辑：

将解析器更改为xlml即可。

为什么使用html5lib存在此问题（目前仅在此特定页面上）？

Answer 1

尝试这个


    from bs4 import BeautifulSoup
    import requests

    # Print all links in the page

    linkpage = "https://automatetheboringstuff.com/chapter12/"
    page = requests.get(linkpage)
    page.econding = "utf-8"
    data = page.text
    html = BeautifulSoup(data)

    all_link=html.find_all('a')
    for link in all_link:
        print(link.get('href'))

BeautifulSoup的find_all方法进入循环

1 个答案: