Question

我对如何从下面描述的文件中读取二进制数据感到困惑。描述如何创建此数据的文档说明如下：

有一个“日志文件记录开始”，格式是这样的，有一个明文消息以ctrl-Z'0x1a'终止（DOS / Windows文件结束），ctrl-D'0x04' （Unix的文件结束），并且'0x00'（按照设计文档的顺序）。

然后，有值0x12345678（长度为4个字节，允许任意解码器确定字节存储顺序）

在此之后，生活数据的核心。

我的代码来读取此文件：

f = open(filename, 'rb')
while True:
    byte = f.read(1)
    if byte.encode('hex') == '1a':            #  ctrl-z
        if s[i+1].encode('hex') == '04':      #  ctrl-D
            if s[i+2].encode('hex') == '00':  #  null
                print s[i:i+8].encode('hex')
                break

打印＆gt;＆gt; 1a04007856341200

如您所见，0x12345678隐藏在那里。我从研究中得知，这意味着数据存储在“小端”中。我现在的工具（我觉得）让事情变得比我需要的更困难。例如，以下代码选择创建文件的年份（YYYY）

i = year_location_in_file  # just a pointer

created_year = struct.unpack('<cc', s[i:i+2])
print 'created_year as hex:', created_year

created_year = int(''.join([e for e in created_year][::-1]).encode('hex'), 16)
print 'created year as int:', created_year

打印：

>> created_year as hex: ('\xdd', '\x07')

>> created year as int: 2013

我花了很多时间试图理解所有建议的问题并阅读我可能谷歌的所有内容。我希望答案能帮助我以及其他任何努力理解二进制文件中字节排序的人。谢谢社区。

编辑：使用print(repr(open(filename, 'rb').read(600)))给出

....sometext\xd4\xb4\x97\x1a\x04\x00xV4\x12\x00U\x01\.....

- B

Answer 1

我认为您的问题来自于使用c作为struct.unpack的代码，而不是更大的类型。 c是一个字符，只有一个字节长（这意味着字节顺序是无关紧要的）。相反，使用h表示两个字节的短整数，或l表示长度为4个字节的int（如果需要无符号值，请使用国会大写字母）。

year_data = b"\x77\x07"                        # bytes sliced from the binary file
year = struct.unpack("<h", year_data)          # unpacked to int in one go

读取二进制数据（字节顺序）

1 个答案: