How to tesseract multiple files in the same folder from command prompt?

13,570

Solution 1

What is the Windows equivalent of the Unix for i command?

Without knowing exactly what the tesseract command does on Unix compared to Windows it is difficult to give a comprehensive answer.

On Windows you can use the for command to perform a command on several files.

From a command line:

for %i in (*.tif) do tesseract %i outtext

In a batch file:

for %%i in (*.tif) do tesseract %%i outtext

Further Reading

Solution 2

In the above example:

for %%i in (*.tif) do tesseract %%i outtext

Tesseract will write over the same output file outtext.txt for each iteration. You will end up with a single file (outtext.txt) containing only the text from the last image. You need to uniquely name each output file. You could replace the string outtext with %%i as shown below.

for %%i in (*.tif) do tesseract %%i %%i

However, if you want a different output file name, you can assign an additional variable using the set command. Then increment this variable for each iteration.

set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file%j%
set /a j+=1
)

However, %j% will expand to '1' for each iteration. You will end up with one file named outputfile1.txt. The %j% is expanded once at the beginning of the loop, and that same value is used for each iteration. Using the setlocal enabledelayedexpansion command and replacing %j% with !j! will force Windows to expand !j! for each iteration. To restore the previous environment settings a matching endlocal command should be issued.

setlocal enabledelayedexpansion
set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file!j!
set /a j+=1
)
endlocal

I tested this successfully on Microsoft Windows 7 Home Premium edition. I hope it helps you.

Solution 3

for %i in (*.tif) do (tesseract %i stdout 1>> out.txt)

Will find all the tif files and pipe the stdout into the target out.txt

Solution 4

dir "folder_path\*.tif" /s /b > "folder_path\input.txt"
"tesseract_path\tesseract" "folder_path\input.txt" "folder_path\output"
Share:
13,570
Thomas Padilla
Author by

Thomas Padilla

I am Digital Humanities Librarian at Michigan State University Libraries. Prior to my move to Michigan I was at the University of Illinois at Urbana Champaign working at the Scholarly Commons and the Preservation Unit of the University Library. Prior to that I was at the Library of Congress doing digital preservation outreach and education.

Updated on July 19, 2022

Comments

  • Thomas Padilla
    Thomas Padilla almost 2 years

    I know how to Tesseract multiple files in the same directory using Terminal on OS X.

    for i in *.tif ; do tesseract $i outtext;  done;
    

    Does anyone have suggestions for how to do this on the Command Prompt on a computer running Windows?

  • LearnToGrow
    LearnToGrow over 6 years
    I get an error ''syntax error near unexpected token `(' '' when I execute the first and second examples. What is it ?