Question

我在将文字图片中的字词输出到.txt文件时遇到了一些麻烦。

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

text = pytesseract.image_to_string(Image.open("book_image.jpg"))

file = open("text_file","w")
file.write(text)
print(text)

读取图像文件并打印出图像上的文字的代码可以正常工作。问题是当我尝试获取文本并将其写入文件时，我收到以下错误;

UnicodeEncodeError：'ascii'编解码器不能编码位置366中的字符u'\ u2019'：序数不在范围内（128）

有人可以解释我如何将变量text转换为字符串吗？

Answer 1

尝试一下：

file = open("text_file", "w", encoding='utf8', errors="ignore")

Answer 2

也尝试：

file.write(text).encode('utf-8').strip()

写入文本文件 - 'ascii'编解码器无法编码字符

2 个答案: