通过BeautifulSoup提取随机数据

时间:2015-02-06 18:59:00

标签: python beautifulsoup

我目前正在构建一个Python IRC bot。我发现了一些从bash.org中提取数据并向该频道发布随机引用的内容。我已经尝试将其修改为从另一个源中提取 - 修改它正在查看的类 - 但最终会出现“在调用Python对象时超出最大递归深度”错误。此时我正在拔头发。任何建议都会有所帮助。

from plugins.util import command, get_url
from bs4 import BeautifulSoup


@command("q", "qdb")
def quote(m):
    """Return a quote from sysadmin quotes."""


    #- Post a quote from quote.org. If given a quote number, it will try to post it. Otherwise it
    #- will post a random quote. If the quote is too long, it will direct the user to the URL.
    #- Please note that there is no filtering here, so some of the quotes may be inappropriate.

    if len(m.line) > 1 and m.line[1].isnumeric():
        quote_specific(m, m.line[1])
    else:
        quote_rand(m)


def quote_rand(m):
    """Get a random quote from sysadmin quotes"""
    resp = get_url(m, "http://quotes.sysadmin.technology/?random1")
    soup = BeautifulSoup(resp)
    raw = soup.find(class_="quote_quote")
    if raw:
        meta = soup.find(class_="quote_option-bar")
        while True:
            if not raw:
                quote_rand(m)
                return
            lines = raw.get_text().splitlines()
            if len(lines) <= 5:
                break
            raw = raw.find_next(class_="quote_quote")
            meta = soup.find_next(class_="quote_option-bar")
        format_quote(m, lines, meta)
    else:
        m.bot.private_message(m.location, "Could not find quote.")


def quote_specific(m, number):
    """Get a specific quote from sysadmin quotes."""
    resp = get_url(m, "http://quotes.sysadmin.technology/?" + number)
    soup = BeautifulSoup(resp)
    raw = soup.find(class_="quote_qoute")
    if raw:
        meta = soup.find(class_="quote_option-bar")
        lines = raw.get_text().splitlines()
        if len(lines) > 5 and not m.is_pm:
            m.bot.private_message(m.location, "This quote is too long to post publicly, "
                                              "but you can view it at http://quotes.sysadmin.technology/?"
                                              "{}.".format(number))
        else:
            format_quote(m, lines, meta, number)
    else:
        m.bot.private_message(m.location, "Could not find quote.")


def format_quote(m, raw, meta, number=None):
    """Format the quote with some metadata."""
    try:
        score = meta.font.string
        score_str = "\x0304{}\x03".format(score) if "-" in score else "\x0303{}\x03".format(score)
        url = "http://quotes.sysadmin.technology/?" + (number if number else meta.b.string.strip("#"))
        meta_str = "--- {} ({}) {} ".format(meta.b.string, score_str, url)
    except AttributeError as e:
        if number:
            m.bot.private_message(m.location, "Could not find quote.")
        else:
            quote_rand(m)
    else:
        m.bot.private_message(m.location, meta_str.ljust(83, '-'))
        for line in raw:
            m.bot.private_message(m.location, line)
        m.bot.private_message(m.location, "-" * 80)

1 个答案:

答案 0 :(得分:1)

Python具有有限的递归深度,因此对于可能达到此限制的问题,您最好提出算法的迭代版本。如果你太懒了,并且确定你的递归总是总是比Python设置的默认限制稍微深一点,你可能只是通过改变这个限制就可以逃脱。怎么会这样?前往What is the maximum recursion depth in Python, and how to increase it?