"Adding" new fonts to Tesseract eng.traineddata

16,381

Should use a different name, e.g., eng1.traineddata. That way you can use the new data with the original one by specifying the language option -l eng+eng1.

Share:
16,381

Related videos on Youtube

md1hunox
Author by

md1hunox

Updated on July 07, 2022

Comments

  • md1hunox
    md1hunox almost 2 years

    As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols).

    I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine

    and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02

    For generating box files I used QT Box Editor

    After using above tools I get eng.traineddata file. All tutorials tell me to add this eng.traineddata file to the Tesseract-OCR\tessdata folder, but doing so, it will replace the original eng.traineddata file. After doing this will I lose the default fonts that come with Tesseract 3.x ?

    How can I Add new fonts? Its still not clear to me. I hope someone can help me here. Thanks.

  • marcAntoine
    marcAntoine about 10 years
    where can i specify the language option -l eng+eng1 ?
  • tipycalFlow
    tipycalFlow almost 10 years
    This might sound too lazy but is there a way to provide a font file as input (to a website, say) and a trained tessdata is provided as output?
  • Alexander Taubenkorb
    Alexander Taubenkorb over 9 years
    @tipycalFlow jTessBoxEditor has a TIFF/Box Generator. You can provide a font file and get a box with the correct values. With Serak Tesseract Trainer you can do the rest.