Question

我是初学者，正在尝试编写一个程序，该程序将读取.exe文件，.class文件或.pyc文件，并获取字母数字字符的百分比（a-z，A-Z，0-9）。这就是我现在所拥有的（我只是想看看我现在是否可以识别任何东西，而不是想要计算内容）：

Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/Cellar/phoenix/apache-phoenix-4.8.0-HBase-1.2-bin/phoenix-4.8.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/09/21 10:32:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/21 10:32:26 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.phoenix.shaded.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.phoenix.shaded.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

此代码打印出各种字节，例如

chars_total = 0
chars_alphnum = 0
iterate = 1

with open("pythonfile.pyc", "rb") as f:
    byte = f.read(iterate)
    while byte != b"":

        chars_total += 1
        print (byte)

        iterate +=1
        byte = f.read(iterate)

但我自己翻译字节时遇到了麻烦。

我在导入binascii后尝试了b'\xe1WQ\x00' b'\x00\x00c\x00\x00'，它将所有内容转换为字母数字字符，这似乎不是我想要的。所以我只是犯了一些严重错误的东西，或者我至少在正确的轨道上？

完全免责声明，这在很大程度上与家庭作业相关，但我们有权使用本网站，因为课堂内的材料和阅读都没有涵盖任何编码。是的，在我来到这里之前，我一直试图解决这个问题。

Answer 1

假设您正在读取可能无法将其解码为ASCII / UTF-8的任意二进制文件，您可以尝试类似以下内容

import string
# create a set of the ascii code points for alphanumerics
alphanumeric_codes = {ord(c) for c in string.ascii_letters + string.digits}
file_bytes = b'...'
alphanumerics = (b for b in file_bytes if b in alphanumeric_codes)
percent_alphanumerics = 100.0 * len(alphanumerics) / len(filebytes)

Answer 2

在Windows上，你可以使用一个简单的PowerShell脚本来获取hexdump（看看这里http://windowsitpro.com/powershell/get-hex-dumps-files-powershell），然后使用Python将它解码为你想要的任何标准（ascii，unicode）（看看这里https://docs.python.org/2/library/functions.html#chr），只保留字母数字字符。

在Linux上， $ man hexdump 在终端上。

如何检查二进制文件中的字母数字字符

2 个答案: