I have millions of images, and I am able to use OCR with pytesseract to perform descent text extraction, but it takes too long to process all of the images.
Thus I would like to determine if an image simply contains text or not, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.
I was thinking about building a SVM or some machine learning model to help detect, but I was hoping if anyone new of a method to quickly determine if an object contains text or not.