Recognize numbers in images
Why not look at using an open source OCR engine such as Tesseract?
C# Wrapper for Tesseract
Java Wrapper for Tesseract
While you might not consider using a third-party library as implementing it yourself, there's a tremendous amount of work that goes into just integrating the third-party tool. Keep in mind also that something that may seem simple (recognizing the number 5 versus the number 6) is often very complex; we're talking thousands and thousands of lines of code complex. In the least, look at the source code for tesseract and it'll give you a good reason to want to leverage a third-party library.
Here's another SO question that'll give you some ideas about the algorithms involved: https://stackoverflow.com/questions/850717/what-are-some-popular-ocr-algorithms
Related videos on Youtube
svensUpdated on July 09, 2022
svens 6 months
I've been searching for resources for number recognition in images on the web. I found many links providing lots of resources on that topic. But unfortunately it's more confusing than helping, I don't know where to start.
I've got an image with 5 numbers in it, non-disturbed (no captcha or something like this). The numbers are black on a white background, written in a standard font.
My first step was to separate the numbers. The algorithm I currently use is quite simple, it just checks if a column is entirely white and thus a space. Then it trims each character, so that there is no white border around it. This works quite well.
But now I'm stuck with the actual recognition of the number. I don't know what's the best way of guessing the correct one. I don't think directly comparing to the font is a good idea, because if the numbers only differ a little, it will no more work.
Could anyone give me a hint on how this is done?
It doesn't matter to the question, but I'll be implementing this in C# or Java. I found some libraries which would do the job, but I'd like to implement it myself, to learn something.
svens almost 13 yearsThanks for the tip. Actually I'm not that good in C/C++ and there's a lot of code. I'm still hoping not having to try to understand a whole OCR software project, just for learning number recognition.
Keith Adler almost 13 yearsThis will remove the need for you to use C++ ... the C# wrapper is pretty straight-forward. Unless you want to become an expert in machine learning and image optimization you really don't want to try to roll your own OCR solution.
rook almost 13 years+1 Tesseract is awesome. You can use any language you want as long as you call it on the command line.
Keith Adler almost 13 yearsYou can use it as a DLL as well with not much effort so no command line necessary. It comes with this out of the box as they say in their release notes. code.google.com/p/tesseract-ocr/wiki/ReleaseNotes