Question

我正在尝试使用请求下载文件。我在python 3.6.5上运行它。下面是我的代码：

import requests 
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"

r = requests.get(file_url, stream = True) 

with open("python.pdf","wb") as pdf: 
    for chunk in r.iter_content(chunk_size=1024): 
        if chunk: 
            pdf.write(chunk)

出现以下错误：

ConnectionError: HTTPConnectionPool(host='codex.cs.yale.edu', port=80): Max retries exceeded with url: /avi/db-book/db4/slide-dir/ch1-2.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000001421CF5080>: Failed to establish a new connection: [Errno 11002] getaddrinfo failed',))

对于相同的问题，我尝试了许多建议的方法，例如增加超时时间，但这无济于事。另外，该链接运行良好。

这里有什么问题的想法吗？

Answer 1

我建议您使用伪造的用户代理（例如https://pypi.org/project/fake-useragent/）并使用代理轮换来访问您尝试访问的端点。关于如何实现这些目标的一个很好的例子是https://www.scrapehero.com/how-to-rotate-proxies-and-ip-addresses-using-python-3/

Answer 2

问题出在远程终端上。远程终端以某种方式将无法执行连接，并且可能会引发错误。在我的个人计算机上运行正常。

感谢您的建议。

在python中下载文件时出现问题

2 个答案: