在python中压缩大数据的麻烦

时间:2014-05-27 05:30:21

标签: python zlib

我在Python中有一个脚本来压缩大字符串:

import zlib

def processFiles():
  ...
  s = """Large string more than 2Gb"""
  data = zlib.compress(s)
  ...

当我运行此脚本时,出现错误:

ERROR: Traceback (most recent call last):#012  File "./../commands/sce.py", line 438, in processFiles#012    data = zlib.compress(s)#012OverflowError: size does not fit in an int

一些信息:

zlib。版本 =' 1.0'

zlib.ZLIB_VERSION =' 1.2.7'

# python -V
Python 2.7.3

# uname -a
Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

# free
             total       used       free     shared    buffers     cached
Mem:      65997404    8096588   57900816          0     184260    7212252
-/+ buffers/cache:     700076   65297328
Swap:     35562236          0   35562236

# ldconfig -p | grep python
libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0
libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so

如何在Python中压缩大数据(超过2Gb)?

3 个答案:

答案 0 :(得分:3)

我压缩大数据的功能:

    {php}
    $url = $this->get_video_meta('test'); 
    echo $url; 
    {/php}

答案 1 :(得分:2)

这不是RAM问题。简单地说,zlib或python绑定都无法处理大于4GB的数据。

将数据拆分为4GB(或更小的块)并分别处理每个数据块。

答案 2 :(得分:0)

尝试直播...

import zlib

compressor = zlib.compressobj()
with open('/var/log/syslog') as inputfile:
     data = compressor.compress(inputfile.read())

print data