How to search specific content (with or without regular expression) in pdf?

Asked Jun 25 '15 at 09:31

Active Jun 25 '15 at 09:34

Viewed 40 times

I have a list of PDF files. I want to search for the presence of specific content in each of these files and separate a file that has the content from the other files. I want a know whether such a search function is possible using the Java library iText.

edited Jun 25 '15 at 09:34

Bruno Lowagie

75,994
9
109
165

asked Jun 25 '15 at 09:31

Laxmi Raja

1

The answer to your question is "Yes, it's probably possible, but it depends on the nature of your PDFs." You should be more specific, because it's not possible (1) if your files consist of scanned images or (2) if the fonts don't allow text extraction. If your next question will be "How can I do this", then you should first show what you have tried. For instance: have you tried [this](http://stackoverflow.com/questions/23693706/english-text-extracted-using-itextpdf-is-not-understandable)? – Bruno Lowagie Jun 25 '15 at 09:37
1

*specific content* - please also specify the nature of that specific content. Is it just a sequence of words on a line? Is it an image? Is it a background coloration? – mkl Jun 25 '15 at 10:26

How to search specific content (with or without regular expression) in pdf?

0 Answers0