Question

我收到了这个错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position: 0, invalid start byte

我找到了这个解决方案：

>>> b"abcde".decode("utf-8")

从这里： Convert bytes to a Python string

但如果a）你不知道0xff在哪里和/或b）你需要解码文件对象，你如何使用它？什么是正确的语法/格式？

我正在解析一个目录，所以我尝试一次浏览一个文件。（注意：当项目变大时，这不会起作用!!!）

>>> i = "b'0xff'"
>>> with open('firstfile') as f:
...     g=f.readlines()
... 
>>> i in g
False
>>> 0xff in g
False
>>> '0xff' in g
False
>>> b'0xff' in g
False

>>> with open('secondfile') as f:
<snip - same process>

>>> with open('thirdfile') as f:
...     g = f.readlines()
... 
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

所以，如果这是正确的文件，如果我不能用Python打开它（我把它放在崇高的文本中，什么也没找到），我该如何解码或编码呢？感谢。

Answer 1

你有很多问题：

i = "b'0xff'"创建一个7字节的字符串，而不是一个0xFF字节。 i = b'\xff'或i = bytes([0xff])是正确的方法。
open默认使用local.getpreferredencoding(False)返回的编码解码文件。以二进制模式打开以获取未解码的原始字节：open('firstfile','rb')。
g=f.readlines()返回行列表。 i in g检查i内容的完全匹配与行列表中某行的内容。
使用有意义的变量名称！

相反：

byte = b'\xff'
with open('firstfile','rb') as f:
    file_content = f.read()
if byte in file_content:
   ...

要解码文件，您需要知道它的正确编码并在打开文件时提供它：

with open('firstfile',encoding='utf8') as f:
    file_content = f.read()

如果您不知道编码，第三方chardet模块可以帮助您猜测。

Answer 2

最简单的方法是使用var text = "hello world"; var key = App.crypto.generateKey(16); App.crypto.encrypt(text, key, function(encryptedText, iv){ console.log("encrypted text:", encryptedText, "iv", iv); var encryptedTextHex = convertUtf8StringToHex(encryptedText); console.log("encrypted text hex", encryptedTextHex); var backToUtf8 = convertHexToUtf8(encryptedTextHex); console.log("Back to utf8", backToUtf8); console.assert(encryptedText == backToUtf8); })来抓住try/catch，然后您就知道您有错误的文件。

很可能该文件未以UTF-8编码。在这种情况下，您可能希望以二进制文件的形式读取文件：

UnicodeDecodeError

如何找出文件的编码是一个不同的问题。

Answer 3

#how to decode byte 0xff in python

as we know this is hexadecimal encoding so , utf-8 , codec and other decoders are not able to decode this byte into string

here we will use 'UTF-16' or 'utf-16' encoding to decode the 0xff byte array into string or ASCII character

let me help you understand this

st = "this world is very beautiful"
print(st.encode('utf-16'))
>>>b'\xff\xfet\x00h\x00i\x00s\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00i\x00s\x00 \x00v\x00e\x00r\x00y\x00 \x00b\x00e\x00a\x00u\x00t\x00i\x00f\x00u\x00l\x00'

again we want to convert it into simple ASCII characters.
There are two method by which we can decode a 0xff code to simple string

st = b'\xff\xfet\x00h\x00i\x00s\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00i\x00s\x00 \x00v\x00e\x00r\x00y\x00 \x00b\x00e\x00a\x00u\x00t\x00i\x00f\x00u\x00l\x00'

第一个是：

print(str(st, "utf-16"))

第二个是

print(st.decode('UTF-16'))

we will get the string as output

>>>'this world is very beautiful'

Python 0xff字节

3 个答案: