Question

我正在尝试从此https://www.google.com/googlebooks/uspto-patents-grants-text.html网页下载所有压缩文件。

完全披露，我不是专业编码员，所以如果我犯了一些愚蠢的错误，请原谅我。

这是我的代码：

from bs4 import BeautifulSoup            
import requests

url = "https://www.google.com/googlebooks/uspto-patents-grants-text.html"
html = requests.get(url)
soup = BeautifulSoup(html.text, "html.parser")

for link in soup.find_all('a', href=True):
    href = link['href']

    if any(href.endswith(x) for x in ['.zip']):
    #if any(href.endswith('.zip')):
        print("Downloading '{}'".format(href))
        remote_file = requests.get(url + href)

        with open(href, 'wb') as f:
            for chunk in remote_file.iter_content(chunk_size=1024): 
                if chunk: 
                    f.write(chunk)

运行代码时遇到的错误是：文件＆＃34; C：/ Users /＃USER＃/＃FILEPATH＃/ Python / patentzipscraper2.py＆＃34;，第16行，in 打开（href，＆＃39; wb＆＃39;）作为f： OSError：[Errno 22]参数无效：http://storage.googleapis.com/patents/grant_full_text/2015/ipg150106.zip＆＃39;

但是，当我在浏览器中输入该地址时，我可以下载压缩文件。我猜这与压缩文件的格式有关，而且我不能直接下载/打开它们，但我不确定是什么。我所依据的代码是下载文件，你可以直接下载（如.txt）

如何下载这些拉链的任何帮助将不胜感激。

Answer 1

在您的代码中实现类似：

import urllib

archive = urllib.request.URLopener()
archive.retrieve("http://yoursite.com/file.zip", "file.zip")

如何使用Python从网站下载所有Zip文件

1 个答案: