Error while installing tesseract-ocr

13,586

I had the same exact issue. Using Visual studio 2017, on windows 10 machine and python 3.6 installed. What worked for me was to:

  1. Download and Install tesseract-ocr executable from https://github.com/UB-Mannheim/tesseract/wiki (Script assumes running from a windows system and saved tesseract installation to the default location suggested I.e. C:\Program Files (x86)\Tesseract-OCR) See https://github.com/tesseract-ocr/tesseract/wiki for more information on installing on different OS types (including windows), using the pre-built binary package.
  2. Ensure you have Python Imaging Library('PIL') or 'pillow' package installed for opening images. (installing PIL didn't work in my setting but pillow did i.e. pip install pillow). The reason you need this is because it is required by pytesseract. See https://pypi.org/project/pytesseract/0.2.5/ for more info on that.
  3. Then to use it successfully in your code simply set the tesseract_cmd path within your code as follows:

    from PIL import Image
    import pytesseract
    
    try:
    img = Image.open(path/to/image.png) 
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
    text = pytesseract.image_to_string(path/to/image.png)
    Print(text)
    

    Hope it helps.

Share:
13,586
Harsh Vardhan
Author by

Harsh Vardhan

NLP & Deep Learning Enthusiast.

Updated on June 17, 2022

Comments

  • Harsh Vardhan
    Harsh Vardhan almost 2 years

    I want to use pytesseract for ocr. So installed it. But before that i needed to install tesseract-ocr. I am using windows 8.1. I opened the command line and ran the command pip install tesseract-ocr. The following lines are the results of that command.

    I am not able to understand whats happening here. How can I understand this and help me to successfully install tesseract on my pc?

    C:\Users\HarshLaptop>pip install tesseract-ocr
    Collecting tesseract-ocr
      Using cached https://files.pythonhosted.org/packages/e2/0d/dcee3dd0fc4c7bcd181
    25a98f8ba6d9db7aecaa40770595203e312649587/tesseract-ocr-0.0.1.tar.gz
    Requirement already satisfied: cython in c:\users\harshlaptop\anaconda3\lib\site
    -packages (from tesseract-ocr) (0.25.2)
    Building wheels for collected packages: tesseract-ocr
      Running setup.py bdist_wheel for tesseract-ocr ... error
      Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c "
    import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\
    \pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open
    )(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __f
    ile__, 'exec'))" bdist_wheel -d C:\Users\HARSHL~1\AppData\Local\Temp\pip-wheel-s
    j29zfyo --python-tag cp36:
      running bdist_wheel
      running build
      running build_py
      file tesseract_ocr.py (for module tesseract_ocr) not found
      file tesseract_ocr.py (for module tesseract_ocr) not found
      running build_ext
      building 'tesseract_ocr' extension
      creating build
      creating build\temp.win-amd64-3.6
      creating build\temp.win-amd64-3.6\Release
      C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c
     /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic:\
    users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual S
    tudio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10
    240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Pro
    gram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows
    Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6\R
    elease\tesseract_ocr.obj
      tesseract_ocr.cpp
      tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'leptonic
    a/allheaders.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN
    \\x86_amd64\\cl.exe' failed with exit status 2
    
      ----------------------------------------
      Failed building wheel for tesseract-ocr
      Running setup.py clean for tesseract-ocr
    Failed to build tesseract-ocr
    Installing collected packages: tesseract-ocr
      Running setup.py install for tesseract-ocr ... error
        Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c
     "import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Tem
    p\\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', op
    en)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, _
    _file__, 'exec'))" install --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-rec
    ord-vnlr99lk\install-record.txt --single-version-externally-managed --compile:
        running install
        running build
        running build_py
        file tesseract_ocr.py (for module tesseract_ocr) not found
        file tesseract_ocr.py (for module tesseract_ocr) not found
        running build_ext
        building 'tesseract_ocr' extension
        creating build
        creating build\temp.win-amd64-3.6
        creating build\temp.win-amd64-3.6\Release
        C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
    /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic
    :\users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual
     Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
    10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\P
    rogram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Window
    s Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6
    \Release\tesseract_ocr.obj
        tesseract_ocr.cpp
        tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'lepton
    ica/allheaders.h': No such file or directory
        error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\B
    IN\\x86_amd64\\cl.exe' failed with exit status 2
    
        ----------------------------------------
    Command "c:\users\harshlaptop\anaconda3\python.exe -u -c "import setuptools, tok
    enize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\\pip-install-x8nz3uhm\
    \tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.rea
    d().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" insta
    ll --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-record-vnlr99lk\install-rec
    ord.txt --single-version-externally-managed --compile" failed with error code 1
    in C:\Users\HARSHL~1\AppData\Local\Temp\pip-install-x8nz3uhm\tesseract-ocr\`enter code here`
    
  • Harsh Vardhan
    Harsh Vardhan almost 6 years
    What is leptonica ?
  • A.s.e
    A.s.e almost 6 years
    Leptonica is a library and dependency for tesseract.github.com/DanBloomberg/leptonica
  • Harsh Vardhan
    Harsh Vardhan almost 6 years
    I have visual studio installed already. still need leptonica ?
  • A.s.e
    A.s.e almost 6 years
    Yes my friend, it is a library for image processing which tesseract uses and dependent for it.