Please make sure the TESSDATA_PREFIX environment variable is set to your tessdata directory

安装 Tesseract 及 PyTesseract 后,在验证时,出现以下异常信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Traceback (most recent call last):
File "/home/zhangjc/Downloads/pyocr.py", line 7, in <module>
print(pytesseract.image_to_string(Image.open("example.png")))
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhangjc/.pyenv/versions/crawler/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 486, in image_to_string
return {
~
...<2 lines>...
Output.STRING: lambda: run_and_get_output(*args),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}[output_type]()
~~~~~~~~~~~~~~^^
File "/home/zhangjc/.pyenv/versions/crawler/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 489, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
~~~~~~~~~~~~~~~~~~^^^^^^^
File "/home/zhangjc/.pyenv/versions/crawler/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 352, in run_and_get_output
run_tesseract(**kwargs)
~~~~~~~~~~~~~^^^^^^^^^^
File "/home/zhangjc/.pyenv/versions/crawler/lib/python3.13/site-packages/pytesseract/pytesseract.py", line 284, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /usr/share/tesseract-ocr/5/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

根据提示信息,需要设置 TESSDATA_PREFIX 环境变量指向包含 Tesseract 语言数据文件(如 eng.traineddata、fra.traineddata 等)的 tessdata 目录。通常该目录位置如下:

  • Linux:/usr/share/tesseract-ocr/4.00/tessdata/
  • MacOS:/usr/local/Cellar/tesseract//share/tessdata/
  • Windows:C:\Program Files\Tesseract-OCR\tessdata\

Linux 下设置环境变量命令如下:

1
export TESSDATA_PREFIX="/usr/share/tesseract-ocr/4.00/tessdata"

再次执行验证,成功!

Linux 下,可以在 /etc/profile 或 ~/.bashrc 中设置环境变量,避免每次临时设置。

其他系统下,可自行设置环境变量。