"Adding" new fonts to Tesseract eng.traineddata
Should use a different name, e.g., eng1.traineddata
. That way you can use the new data with the original one by specifying the language option -l eng+eng1
.
Related videos on Youtube
md1hunox
Updated on July 07, 2022Comments
-
md1hunox almost 2 years
As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols).
I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine
and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02
For generating box files I used QT Box Editor
After using above tools I get
eng.traineddata
file. All tutorials tell me to add thiseng.traineddata
file to theTesseract-OCR\tessdata
folder, but doing so, it will replace the originaleng.traineddata
file. After doing this will I lose the default fonts that come with Tesseract 3.x ?How can I Add new fonts? Its still not clear to me. I hope someone can help me here. Thanks.
-
marcAntoine about 10 yearswhere can i specify the language option -l eng+eng1 ?
-
tipycalFlow almost 10 yearsThis might sound too lazy but is there a way to provide a font file as input (to a website, say) and a trained
tessdata
is provided as output? -
Alexander Taubenkorb over 9 years@tipycalFlow jTessBoxEditor has a TIFF/Box Generator. You can provide a font file and get a box with the correct values. With Serak Tesseract Trainer you can do the rest.