在python中将文件压缩成不同的部分

时间:2019-02-21 14:20:40

标签: python python-2.7 zip compression

在Python(最好是2.7)中,是否可以将文件压缩为几个大小相等的.zip文件?

结果将类似于:(假设选择了200MB,并压缩了1100MB的文件)

compressed_file.zip.001 (200MB)
compressed_file.zip.002 (200MB)
compressed_file.zip.003 (200MB)
compressed_file.zip.004 (200MB)
compressed_file.zip.005 (200MB)
compressed_file.zip.006 (100MB)

2 个答案:

答案 0 :(得分:1)

我认为您可以在shell命令中做到这一点。像

gzip -c /path/to/your/large/file | split -b 150000000 - compressed.gz

您可以从python执行shell。

致谢

Ganesh J

答案 1 :(得分:1)

NB :这是基于这样的假设,即结果只是一个切碎的ZIP文件,没有任何额外的标题或任何东西。

如果您检查文档,可以将ZipFile对象传递给file-like对象以用于I / O。因此,我们应该能够为其提供自己的对象,该对象实现协议的必要子集,并将输出分成多个文件。

事实证明,我们只需要实现3个功能:

  • tell()-仅返回到目前为止已写入的字节数
  • write(str)-写入文件直到最大容量,一旦完全打开新文件,重复直到所有数据写入
  • flush()-刷新当前打开的文件

原型脚本

import random
import zipfile


def get_random_data(length):
    return "".join([chr(random.randrange(256)) for i in range(length)])


class MultiFile(object):
    def __init__(self, file_name, max_file_size):
        self.current_position = 0
        self.file_name = file_name
        self.max_file_size = max_file_size
        self.current_file = None        
        self.open_next_file()

    @property
    def current_file_no(self):
        return self.current_position / self.max_file_size

    @property
    def current_file_size(self):
        return self.current_position % self.max_file_size

    @property
    def current_file_capacity(self):
        return self.max_file_size - self.current_file_size

    def open_next_file(self):
        file_name = "%s.%03d" % (self.file_name, self.current_file_no + 1)
        print "* Opening file '%s'..." % file_name
        if self.current_file is not None:
            self.current_file.close()
        self.current_file = open(file_name, 'wb')

    def tell(self):
        print "MultiFile::Tell -> %d" % self.current_position
        return self.current_position

    def write(self, data):
        start, end = 0, len(data)
        print "MultiFile::Write (%d bytes)" % len(data)
        while start < end:
            current_block_size = min(end - start, self.current_file_capacity)
            self.current_file.write(data[start:start+current_block_size])
            print "* Wrote %d bytes." % current_block_size
            start += current_block_size
            self.current_position += current_block_size
            if self.current_file_capacity == self.max_file_size:
                self.open_next_file()
            print "* Capacity = %d" % self.current_file_capacity

    def flush(self):
        print "MultiFile::Flush"
        self.current_file.flush()


mfo = MultiFile('splitzip.zip', 2**18)

zf = zipfile.ZipFile(mfo,  mode='w', compression=zipfile.ZIP_DEFLATED)


for i in range(4):
    filename = 'test%04d.txt' % i
    print "Adding file '%s'..." % filename
    zf.writestr(filename, get_random_data(2**17))

跟踪输出

* Opening file 'splitzip.zip.001'...
Adding file 'test0000.txt'...
MultiFile::Tell -> 0
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 262102
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130990
MultiFile::Flush
Adding file 'test0001.txt'...
MultiFile::Tell -> 131154
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130948
MultiFile::Write (131112 bytes)
* Wrote 130948 bytes.
* Opening file 'splitzip.zip.002'...
* Capacity = 262144
* Wrote 164 bytes.
* Capacity = 261980
MultiFile::Flush
Adding file 'test0002.txt'...
MultiFile::Tell -> 262308
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 261938
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130826
MultiFile::Flush
Adding file 'test0003.txt'...
MultiFile::Tell -> 393462
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130784
MultiFile::Write (131112 bytes)
* Wrote 130784 bytes.
* Opening file 'splitzip.zip.003'...
* Capacity = 262144
* Wrote 328 bytes.
* Capacity = 261816
MultiFile::Flush
MultiFile::Tell -> 524616
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261770
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261758
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261712
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261700
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261654
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261642
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261596
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261584
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Tell -> 524848
MultiFile::Write (22 bytes)
* Wrote 22 bytes.
* Capacity = 261562
MultiFile::Write (0 bytes)
MultiFile::Flush

目录列表

-rw-r--r-- 1   2228 Feb 21 23:44 splitzip.py
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.001
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.002
-rw-r--r-- 1    582 Feb 22 00:07 splitzip.zip.003

验证

>7z l splitzip.zip.001

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18

Listing archive: splitzip.zip.001

--
Path = splitzip.zip.001
Type = Split
Volumes = 3
----
Path = splitzip.zip
Size = 524870
--
Path = splitzip.zip
Type = zip
Physical Size = 524870

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-02-22 00:07:34 .....       131072       131112  test0000.txt
2019-02-22 00:07:34 .....       131072       131112  test0001.txt
2019-02-22 00:07:36 .....       131072       131112  test0002.txt
2019-02-22 00:07:36 .....       131072       131112  test0003.txt
------------------- ----- ------------ ------------  ------------------------
                                524288       524448  4 files, 0 folders