我正在使用urllib.request包来打开和阅读网页。我想确保我的代码能很好地处理重定向。现在,当我看到重定向时,我只是失败了(这是一个HTTPError)。有人可以指导我如何处理它?我的代码目前看起来像:
try:
text = str(urllib.request.urlopen(url, timeout=10).read())
except ValueError as error:
print(error)
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
请帮助我,我是新手。谢谢!
答案 0 :(得分:0)
我使用特殊的URLopener来捕获重定向:
import urllib
class RedirectException(Exception):
def __init__(self, errcode, newurl):
Exception.__init__(self)
self.errcode = errcode
self.newurl = newurl
class MyURLopener(urllib.URLopener):
# Error 301 -- relocated (permanently)
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
if headers.has_key('location'):
newurl = headers['location']
elif headers.has_key('uri'):
newurl = headers['uri']
else:
newurl = "Nowhere"
raise RedirectException(errcode, newurl)
# Error 302 -- relocated (temporarily)
http_error_302 = http_error_301
# Error 303 -- relocated (see other)
http_error_303 = http_error_301
# Error 307 -- relocated (temporarily)
http_error_307 = http_error_301
urllib._urlopener = MyURLopener()
现在我需要捕获RedirectException并且瞧 - 我知道有重定向,我知道URL。警告 - 我在Python 2.7中使用代码,不知道它是否适用于Python 3.
答案 1 :(得分:0)
使用requests
包我能找到更好的解决方案。您需要处理的唯一例外是:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
要获取该页面的文本,您需要做的就是:
r.text