逐字节读取和写入压缩

时间:2016-01-22 04:04:30

标签: python binary hex

我正在尝试使用python实现Lempel-Ziv-Welch算法,但是在使用二进制文件编写文件时遇到了麻烦。

action = sys.argv[3]
if action == "compress":
# initialize dictionary
dictionary = {}
for i in range(0,256):
    # for single characters, the value is the same as the key
    # in the compressed file, these would appear as is
    dictionary[chr(i)] = i 
input_file = open(sys.argv[1], 'rb+')
output_file = open(sys.argv[2], 'wb')

data = input_file.read()
# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]
# look for the shortest string not in the dictionary
while i < len(data) - 2:
    while current_data in dictionary.keys():
        if j < len(data) + 1:
            j = j + 1
            current_data = data[i:j]
        else:
            break
    # once the shortest string is found, add it to the dictionary 
    if current_data not in dictionary.keys():
        dictionary[current_data] = len(dictionary)
        thing_to_write = dictionary[current_data[:-1]]
        i = j - 1
        current_data = data[i:j]
    else:
        thing_to_write = dictionary[current_data]
        i = i + 1
        j = i + 1
    # then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
    mylist = []
    thing_to_write = format(thing_to_write,'x')
    thing_to_write = thing_to_write
    for char in thing_to_write:
        mylist.append(char.encode('hex'))
        for elem in mylist:
            output_file.write(elem)
input_file.close()
output_file.close()
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."

我尝试用很多不同的格式编写,比如十六进制,二进制等,但我想我只是把它们写成8位字符。我怎么能写原始二进制文件?

1 个答案:

答案 0 :(得分:0)

目前尚不清楚你要写什么。你得到的数据最终可能大于256,所以我假设你想要在输出文件中写入2字节无符号整数?

如果是这种情况,那么我建议您研究Python的struct.pack函数,该函数旨在将Python类型的数据转换为二进制表示。如果您的数据是字节大小的,那么您可以使用output_file.write(chr(x))来编写每个字符。

以下使用Python的struct.pack()

import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))

import sys
import struct

action = sys.argv[3]

if action == "compress":
    # initialize dictionary
    dictionary = {}

for i in range(0,256):
    # for single characters, the value is the same as the key
    # in the compressed file, these would appear as is
    dictionary[chr(i)] = i 

input_file = open(sys.argv[1], 'rb')
output_file = open(sys.argv[2], 'wb')

data = input_file.read()

# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]

# look for the shortest string not in the dictionary

while i < len(data) - 2:
    while current_data in dictionary.keys():
        if j < len(data) + 1:
            j = j + 1
            current_data = data[i:j]
        else:
            break

    # once the shortest string is found, add it to the dictionary 
    if current_data not in dictionary.keys():
        dictionary[current_data] = len(dictionary)
        thing_to_write = dictionary[current_data[:-1]]
        i = j - 1
        current_data = data[i:j]
    else:
        thing_to_write = dictionary[current_data]
        i = i + 1
        j = i + 1

    # then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
    output_file.write(struct.pack('H', thing_to_write))     # Convert each thing into 2 byte binary

input_file.close()
output_file.close()

print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."