重定向处理程序python 3.4.3

时间:2017-05-29 23:58:31

标签: python python-3.x urllib

我正在使用urllib.request包来打开和阅读网页。我想确保我的代码能很好地处理重定向。现在,当我看到重定向时,我只是失败了(这是一个HTTPError)。有人可以指导我如何处理它?我的代码目前看起来像:

try:
        text = str(urllib.request.urlopen(url, timeout=10).read())
except ValueError as error:
        print(error)
except urllib.error.HTTPError as error:
        print(error)
except urllib.error.URLError as error:
        print(error)
except timeout as error:
        print(error)

请帮助我,我是新手。谢谢!

2 个答案:

答案 0 :(得分:0)

我使用特殊的URLopener来捕获重定向:

import urllib

class RedirectException(Exception):
    def __init__(self, errcode, newurl):
        Exception.__init__(self)
        self.errcode = errcode
        self.newurl = newurl

class MyURLopener(urllib.URLopener):
    # Error 301 -- relocated (permanently)
    def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
        if headers.has_key('location'):
            newurl = headers['location']
        elif headers.has_key('uri'):
            newurl = headers['uri']
        else:
            newurl = "Nowhere"
        raise RedirectException(errcode, newurl)

    # Error 302 -- relocated (temporarily)
    http_error_302 = http_error_301
    # Error 303 -- relocated (see other)
    http_error_303 = http_error_301
    # Error 307 -- relocated (temporarily)
    http_error_307 = http_error_301

urllib._urlopener = MyURLopener()

现在我需要捕获RedirectException并且瞧 - 我知道有重定向,我知道URL。警告 - 我在Python 2.7中使用代码,不知道它是否适用于Python 3.

答案 1 :(得分:0)

使用requests包我能找到更好的解决方案。您需要处理的唯一例外是:

 try:
        r = requests.get(url, timeout =5)

except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop

except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one

except requests.exceptions.ConnectionError:
# Connection could not be completed

except requests.exceptions.RequestException as e:
# catastrophic error. bail.

要获取该页面的文本,您需要做的就是: r.text