Quick way to classify if an image contains text or not

Question

I have millions of images, and I am able to use OCR with pytesseract to perform descent text extraction, but it takes too long to process all of the images.

Thus I would like to determine if an image simply contains text or not, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.

I was thinking about building a SVM or some machine learning model to help detect, but I was hoping if anyone new of a method to quickly determine if an object contains text or not.

Also, possible [duplicate](https://stackoverflow.com/questions/4606274/algorithm-to-detect-presence-of-text-on-image) — Peter Wood, Mar 30 '18 at 15:03
It almost looks like a duplicate question, Peter, but it is a bit different. — John Rothman, Mar 30 '18 at 18:12

score 2 · Answer 1 · answered Mar 30 '18 at 16:05

2

Unfortunately there is no way to tell if an image has text in it, without performing OCR of some kind on it.

You could build a machine learning model that handles this, however keep in mind it would still need to process the image as well.

answered Mar 30 '18 at 16:05

BradleyRobertR

57
2
8

Quick way to classify if an image contains text or not

1 Answers1