如何摆脱不可打印的字符?

时间:2017-08-07 18:41:47

标签: python string python-3.x encoding utf-8

为了好玩,我试图在Python 3.6.0中创建一个批量重命名应用程序,它应该捕获,基于正则表达式拆分文件名,并正确命名文件。出于测试目的,我在输出文件中打印,直到它正常工作。

这是我的代码:

def batch_rename(self):
    if self._root is None:
        raise NotADirectoryError("self._root is empty")

    with open('output.txt', 'w') as self._open_file:
        for root, dirs, files in os.walk(self._root):
            for name in files:
                new_file = self._rename_file(root, name)
                self._add_size(root, name)
                self._open_file.write("\"{0}\" renamed to \"{1}\"\n".format(name, new_file))
                self._count += 1
            self._open_file.write("\n")

        self._open_file.write("Total files: {0}\n".format(self._count))
        self._open_file.write("Total size: {0}\n".format(self._get_total_size()))

def _rename_file(self, root_path, file_name):
    file_name = bytes(file_name, 'utf-8').decode('utf-8', 'ignore')
    # file_name = ''.join(x for x in file_name if x in string.printable)
    split_names = re.split(pattern=self._re, string=file_name)

    if len(split_names) > 1:
        new_file = self._prefix + ' ' + ''.join(split_names)
    else:
        new_file = self._prefix + ' ' + '' + split_names[0]

    new_file = new_file.replace('  ', ' ')

    return new_file

我因为不可写的字符而遇到编码问题,如:

  • 俄文字母(奇怪,我知道)
  • 符号,如心,俱乐部,黑桃等。

我收到的错误消息是:

Traceback (most recent call last):
  File "C:/Users/thisUser/OneDrive/Projects/Examples.Python/BatchFileRenamer/BatchFileRename2.py", line 90, in <module>
    br.batch_rename()
  File "C:/Users/thisUser/OneDrive/Projects/Examples.Python/BatchFileRenamer/BatchFileRename2.py", line 34, in batch_rename
    self._open_file.write("\"{0}\" renamed to \"{1}\"\n".format(name, new_file))
  File "C:\Users\thisUser\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position 10: character maps to <undefined>

我试着查看3个SO问题/答案:

我找不到有用的答案。

有人可以帮忙吗?我非常感谢:)

1 个答案:

答案 0 :(得分:0)

而不是使用:

with open('output.txt', 'w') as self._open_file:

尝试使用:

import codecs

with codecs.open('output.txt', 'w', 'utf-8')

这样就可以使用正确的utf-8编码打开新文件。