Question

此命令返回null，因为“α”不是ISO-8859-1编码。

b'?'

此命令返回LANG=en_US.UTF-8 python -c "print('α'.encode('ISO-8859-1', 'replace'))"，我不明白。

b'\xce\xb1'

造成这种情况的原因是什么？我想要做的是删除不在编码中的字符（此处为ISO-8859-1），将其替换为LANG=en_US.ISO-8859-1 python -c "print('α'.encode('ISO-8859-1', 'replace'))"，因为我认为此代码应该这样做。

Answer 1

它没有改变str.encode的输出;它正在改变sys.stdin的编码。

$ LANG=en_US.UTF-8 python -c "print(__import__('sys').stdin.encoding)"
UTF-8
$ LANG=en_US.ISO-8859-1 python -c "print(__import__('sys').stdin.encoding)"
ISO-8859-1

因此，Python将终端中的UTF-8 b'\xce\xb1'解释为文字字节：

$ LANG=en_US.ISO-8859-1 python3 -c "print(len('α'))"
2
$ LANG=en_US.UTF-8 python3 -c "print(len('α'))"                 
1

为什么LANG改变了str.encode（）的输出

1 个答案: