Windows 10上的pytesseract:打开数据文件时出错

时间:2016-12-29 18:11:39

标签: python-3.x python-tesseract

我在Windows 10 x64上使用pytesseract,而python是3.5.2 x64,Tesseract是4.0,代码如下:

# -*- coding: utf-8 -*-

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))

错误:

Traceback (most recent call last):
  File "D:/test.py", line 10, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
    raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

C:\Program Files (x86)\Tesseract-OCR\tessdata,就像这样:

enter image description here

为什么会这样?

2 个答案:

答案 0 :(得分:0)

TESSDATA_PREFIX环境变量设置为C:\Program Files (x86)\Tesseract-OCR\

答案 1 :(得分:0)

如果您有tessdata错误,例如:“打开数据文件时出错......”

tessdata_dir_config = '--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
# Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
# It's important to add double quotes around the dir path.

pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)