1

I am trying to complete a project that has to include some OCR. For the job I picked Tesseract OCR but the results are not optimal. I have tried to limit the character set to 1234567890- but the results are not good. Is there an optimal image size I can use or some way to train Tesseract to recognise this kind of string better?

The image is this: Phone

And the result tesseract returns is 05175150152 which is not right, and it should be better since the image is not modified in any way. I use tesseract through PHP with exec with the following command:

"C:\Program Files\Tesseract-OCR\tesseract.exe" C:\wamp\www\a
dwords\phones\center_ctl09_ctl04.png sssd -l eng -psm 7 nobatch letters

Any ideas on what i am doing wrong?

rmtheis
  • 5,992
  • 12
  • 61
  • 78
Evan
  • 1,683
  • 7
  • 35
  • 65
  • All i have done is install tesseract, if there is a training it must undergo i havent done it. – Evan May 01 '12 at 17:08
  • 1
    The image you provide is too small for tesseract. You should get bigger (in size and DPI) image and add a preprocessing functionality (take a look at this for details http://stackoverflow.com/questions/10188116/trouble-recognizing-digits-in-tesseract-android/10188704#10188704). Alternatively, look for a more accurate SDK. There's not much you can do with PHP, but there a still good options. This may help: http://stackoverflow.com/questions/8753413/optical-character-recognition-for-web-use/8800923#8800923 – Nikolay May 02 '12 at 09:25

1 Answers1

3

The image resolution of 96 DPI is tough for any OCR engine. Try to rescale it to 300 DPI and you will have better results.

Additionally, JPEG is a lossy image format. Use a different one, like TIFF or PNG, if possible.

nguyenq
  • 8,212
  • 1
  • 16
  • 16