在内容标题中未提供时,请在下载前检查文件大小

时间:2015-02-22 21:17:36

标签: python download python-requests filesize

我使用Python 2.78和requests库从HTTP服务器下载文件。在下载之前,我想检查文件大小并在大小超过某个给定限制时执行不同的操作(例如中止)。我知道如果服务器在标题中提供属性content-length,则可以轻松检查这一点 - 但是,我使用的那个不是。

根据this great article on exception handling with requests,在保存到硬盘之前检查文件大小可以通过仅下载标题然后迭代内容而不实际保存文件来完成。这种方法在我的代码中使用。

但是,我得到的印象是我只能迭代一次内容(检查文件大小),然后关闭连接。没有像seek(0)或类似的东西将解析器重置为开头,再次迭代,但这次将文件保存到磁盘。当我尝试这个时(如下面的代码所示),我的硬盘上有一个0 kb大小的文件。

import requests
from contextlib import closing

# Create a custom exception.
class ResponseTooBigException(requests.RequestException):
    """The response is too big."""

# Maximum file size and download chunk size.
TOO_BIG = 1024 * 1024 * 200 # 200MB
CHUNK_SIZE = 1024 * 128

# Connect to a test server. stream=True ensures that only the header is downloaded here.
response = requests.get('http://leil.de/di/files/more/testdaten/25mb.test', stream=True)

try:

    # Iterate over the response's content without actually saving it on harddisk.
    with closing(response) as r:
        content_length = 0
        for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
            content_length = content_length + CHUNK_SIZE

            # Do not download the file if it is too big.
            if content_length > TOO_BIG:
                raise ResponseTooBigException(response=response)

            else:    
                # If the file is not too big, this code should download the response file to harddisk. However, the result is a 0kb file.
                print('File size ok. Downloading...')
                with open('downloadedFile.test', 'wb') as f:
                    for chunk in response.iter_content(chunk_size=CHUNK_SIZE): 
                        if chunk:
                            f.write(chunk)
                            f.flush()

except ResponseTooBigException as e:
    print('The HTTP response was too big (> 200MB).')

我已经尝试使用

制作响应的副本
import copy
response_copy = copy.copy(response)

然后在行

中使用response_copy
with closing(response_copy) as r:

但行中的response

for chunk in response.iter_content(chunk_size=CHUNK_SIZE): 

允许对响应进行过于独立的迭代。但是,这会导致

AttributeError                            Traceback (most recent call last)
<ipython-input-2-3f918ff844c3> in <module>()
     35                         if chunk:
     36                             f.write(chunk)
---> 37                             f.flush()
     38 
     39 except ResponseTooBigException as e:

C:\Python34\lib\contextlib.py in __exit__(self, *exc_info)
    150         return self.thing
    151     def __exit__(self, *exc_info):
--> 152         self.thing.close()
    153 
    154 class redirect_stdout:

C:\Python34\lib\site-packages\requests\models.py in close(self)
    837         *Note: Should not normally need to be called explicitly.*
    838         """
--> 839         return self.raw.release_conn()

AttributeError: 'NoneType' object has no attribute 'release_conn'

0 个答案:

没有答案