Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata
Solution 1
From your post, observed two possible issues.
All the trained language data should be saved in
TESSDATA_PREFIX
, a Windows environmental variable, which is atC:\Program Files (x86)\Tesseract-OCR\tessdata
in your case.The
tesseract
trained English data is namedeng.traineddata
(i.e.'eng'
) unless you modified its name. Refer to this Tesseract Data Files for more information.
In addition, for pytesseract
to read the image file Image.open()
, you may include the full file path (e.g. 'z:\\path\\to\\image'
) if the image file is unable to locate.
Hope to this.
Solution 2
I faced the same problem. I tried all solutions on Google, without success. Finally, I solved the problem by replacing.
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
with
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'.
Solution 3
If you don't want to set environment variable you can pass as an argument as well
For example:
First, do your imports
import pytessetact
from PIL import Image
And now configure pytesseract
pytesseract.pytesseract.tesseract_cmd = "C:/path_to_your_tesseract.exe"
tessdata_dir_config = '--tessdata-dir "C:/path_to_your_tessdata_folder"'
pytesseract.image_to_string(image, config=tessdata_dir_config)
Henry
Updated on June 05, 2022Comments
-
Henry almost 2 years
I am trying to use pytesseract on Jupyter Notebook.
- Windows 10 x64
- Running Jupyter Notebook (Anaconda3, Python 3.6.1) with administrative privilege
- The work directory containing TIFF file is in different drive (Z:)
When I run the following code:
try: import Image except ImportError: from PIL import Image import pytesseract pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"' print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en', config = tessdata_dir_config))
I get the following error:
TesseractError Traceback (most recent call last) <ipython-input-37-c1dcbc33cde4> in <module>() 11 # tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"' 12 ---> 13 print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en')) 14 # print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')) C:\Users\cpcho\AppData\Local\Continuum\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, boxes, config) 123 if status: 124 errors = get_errors(error_string) --> 125 raise TesseractError(status, errors) 126 f = open(output_file_name, 'rb') 127 try: TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata')
I found these two references helpful but I am missing something: https://github.com/madmaze/pytesseract/issues/50 https://github.com/madmaze/pytesseract/issues/64
Thank you for your time on this!