Python Robotparser超时等效

时间:2013-03-05 22:28:32

标签: python python-3.x robots.txt

Python 3.3.0中有没有办法设置robotparser.read()函数的超时? (例如在urllib.request urlopen中)

默认超时60秒有点激烈。

(我自学了Python。)

Python 3.3.0 - robotparser

Python 3.3.0 - urllib.request

1 个答案:

答案 0 :(得分:2)

不,您必须使用socket.setdefaulttimeout()设置全局默认超时,或者为RobotFileParser类创建子类以添加自定义超时:

from urllib.robotparser import RobotFileParser
import urllib.request

class TimoutRobotFileParser(RobotFileParser):
    def __init__(self, url='', timeout=60):
        super().__init__(url)
        self.timeout = 60

    def read(self):
        """Reads the robots.txt URL and feeds it to the parser."""
        try:
            f = urllib.request.urlopen(self.url, timeout=self.timeout)
        except urllib.error.HTTPError as err:
            if err.code in (401, 403):
                self.disallow_all = True
            elif err.code >= 400:
                self.allow_all = True
        else:
            raw = f.read()
            self.parse(raw.decode("utf-8").splitlines())