Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata

11,270

Solution 1

From your post, observed two possible issues.

  1. All the trained language data should be saved in TESSDATA_PREFIX, a Windows environmental variable, which is at C:\Program Files (x86)\Tesseract-OCR\tessdata in your case.

  2. The tesseract trained English data is named eng.traineddata (i.e. 'eng') unless you modified its name. Refer to this Tesseract Data Files for more information.

In addition, for pytesseract to read the image file Image.open(), you may include the full file path (e.g. 'z:\\path\\to\\image') if the image file is unable to locate.

Hope to this.

Solution 2

I faced the same problem. I tried all solutions on Google, without success. Finally, I solved the problem by replacing.

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' 

with

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'.

Solution 3

If you don't want to set environment variable you can pass as an argument as well

For example:

First, do your imports

    import pytessetact
    from PIL import Image

And now configure pytesseract

    pytesseract.pytesseract.tesseract_cmd = "C:/path_to_your_tesseract.exe"
    tessdata_dir_config = '--tessdata-dir "C:/path_to_your_tessdata_folder"'

    pytesseract.image_to_string(image, config=tessdata_dir_config)
Share:
11,270
Henry
Author by

Henry

Updated on June 05, 2022

Comments

  • Henry
    Henry almost 2 years

    I am trying to use pytesseract on Jupyter Notebook.

    • Windows 10 x64
    • Running Jupyter Notebook (Anaconda3, Python 3.6.1) with administrative privilege
    • The work directory containing TIFF file is in different drive (Z:)

    When I run the following code:

    try:
        import Image
    except ImportError:
        from PIL import Image
    import pytesseract
    
    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
    
    tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
    
    print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en', config = tessdata_dir_config))
    

    I get the following error:

    TesseractError                            Traceback (most recent call last)
    <ipython-input-37-c1dcbc33cde4> in <module>()
         11 # tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
         12 
    ---> 13 print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en'))
         14 # print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
    
    C:\Users\cpcho\AppData\Local\Continuum\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, boxes, config)
        123         if status:
        124             errors = get_errors(error_string)
    --> 125             raise TesseractError(status, errors)
        126         f = open(output_file_name, 'rb')
        127         try:
    
    TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata')
    

    I found these two references helpful but I am missing something: https://github.com/madmaze/pytesseract/issues/50 https://github.com/madmaze/pytesseract/issues/64

    Thank you for your time on this!