Question

我使用Django 1.11.12和Python 3.4创建了一个网站。昨天，我偶然发现了一件奇怪的事情。

我的函数看起来像这样：

from bs4 import BeautifulSoup as Soup

def foo():
    t = "bla bla blubb"
    s = Soup(t, 'lxml')
    # do stuff

当Django第一次调用该函数时，一切正常，我得到了结果。但是，第二次运行相同功能时，网站冻结，一段时间后我收到网关超时消息。

现在仅更改解析器：

from bs4 import BeautifulSoup as Soup

def foo():
    t = "bla bla blubb"
    s = Soup(t, 'html.parser') # changed from lxml to html.parser
    # do stuff

一切都反复进行。

这仅仅是我的系统吗（也许我弄乱了东西）？这种行为可能是什么原因？

我很高兴提出任何建议。

Answer 1

我在django网站上有一个使用：

contents = urllib.request.urlopen(request.POST['url']).read()
parsed = BeautifulSoup(contents, "html5lib")
title = parsed.find('title').text


title ---> StackOverflow - Where developers...

但是就lxml而言，我还没有那样使用它，也无法告诉您，但是html5lib也许会更好？

Django与BeautifulSoup的lxml解析器的兼容性

1 个答案: