OCR reading phone numbers with Tesseract

Question

I am trying to complete a project that has to include some OCR. For the job I picked Tesseract OCR but the results are not optimal. I have tried to limit the character set to 1234567890- but the results are not good. Is there an optimal image size I can use or some way to train Tesseract to recognise this kind of string better?

The image is this: Phone

And the result tesseract returns is 05175150152 which is not right, and it should be better since the image is not modified in any way. I use tesseract through PHP with exec with the following command:

"C:\Program Files\Tesseract-OCR\tesseract.exe" C:\wamp\www\a
dwords\phones\center_ctl09_ctl04.png sssd -l eng -psm 7 nobatch letters

Any ideas on what i am doing wrong?

All i have done is install tesseract, if there is a training it must undergo i havent done it. — Evan, May 01 '12 at 17:08
The image you provide is too small for tesseract. You should get bigger (in size and DPI) image and add a preprocessing functionality (take a look at this for details http://stackoverflow.com/questions/10188116/trouble-recognizing-digits-in-tesseract-android/10188704#10188704). Alternatively, look for a more accurate SDK. There's not much you can do with PHP, but there a still good options. This may help: http://stackoverflow.com/questions/8753413/optical-character-recognition-for-web-use/8800923#8800923 — Nikolay, May 02 '12 at 09:25

score 3 · Answer 1 · answered May 12 '12 at 20:47

3

The image resolution of 96 DPI is tough for any OCR engine. Try to rescale it to 300 DPI and you will have better results.

Additionally, JPEG is a lossy image format. Use a different one, like TIFF or PNG, if possible.

answered May 12 '12 at 20:47

nguyenq

8,212
1
16
16

OCR reading phone numbers with Tesseract

1 Answers1