How to tesseract multiple files in the same folder from command prompt?
Solution 1
What is the Windows equivalent of the Unix for i
command?
Without knowing exactly what the tesseract command does on Unix compared to Windows it is difficult to give a comprehensive answer.
On Windows you can use the for
command to perform a command on several files.
From a command line:
for %i in (*.tif) do tesseract %i outtext
In a batch file:
for %%i in (*.tif) do tesseract %%i outtext
Further Reading
- An A-Z Index of the Windows CMD command line - An excellent reference for all things Windows cmd line related.
- for - Conditionally perform a command on several files.
Solution 2
In the above example:
for %%i in (*.tif) do tesseract %%i outtext
Tesseract will write over the same output file outtext.txt
for each iteration. You will end up with a single file (outtext.txt
) containing only the text from the last image. You need to uniquely name each output file. You could replace the string outtext
with %%i
as shown below.
for %%i in (*.tif) do tesseract %%i %%i
However, if you want a different output file name, you can assign an additional variable using the set
command. Then increment this variable for each iteration.
set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file%j%
set /a j+=1
)
However, %j%
will expand to '1' for each iteration. You will end up with one file named outputfile1.txt
. The %j%
is expanded once at the beginning of the loop, and that same value is used for each iteration. Using the setlocal enabledelayedexpansion
command and replacing %j%
with !j!
will force Windows to expand !j!
for each iteration. To restore the previous environment settings a matching endlocal
command should be issued.
setlocal enabledelayedexpansion
set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file!j!
set /a j+=1
)
endlocal
I tested this successfully on Microsoft Windows 7 Home Premium edition. I hope it helps you.
Solution 3
for %i in (*.tif) do (tesseract %i stdout 1>> out.txt)
Will find all the tif files and pipe the stdout into the target out.txt
Solution 4
dir "folder_path\*.tif" /s /b > "folder_path\input.txt"
"tesseract_path\tesseract" "folder_path\input.txt" "folder_path\output"
Thomas Padilla
I am Digital Humanities Librarian at Michigan State University Libraries. Prior to my move to Michigan I was at the University of Illinois at Urbana Champaign working at the Scholarly Commons and the Preservation Unit of the University Library. Prior to that I was at the Library of Congress doing digital preservation outreach and education.
Updated on July 19, 2022Comments
-
Thomas Padilla almost 2 years
I know how to Tesseract multiple files in the same directory using Terminal on OS X.
for i in *.tif ; do tesseract $i outtext; done;
Does anyone have suggestions for how to do this on the Command Prompt on a computer running Windows?
-
LearnToGrow over 6 yearsI get an error ''syntax error near unexpected token `(' '' when I execute the first and second examples. What is it ?