-1

I happily found python code to search for multiple words in pdf.

I wanted to look for the pages where two words exist. For instance, I want both 'Name' and 'Address' to exist in the same page, that give the page location where this occur. If either one word is available, then the page location is not required.

Thank you.

Code that I found: Search Multiple words from pdf

1 Answers1

0

Refereing to the page cited by the author and from what I found here, I would suggest something like:

def string_found(word, string_page):
    if re.search(r"\b" + re.escape(word) + r"\b", string_page,re.IGNORECASE):
        return True
    return False

word1 = "name"
word2 = "adress"

for i in range(0, num_pages):
    page = object.getPage(i)
    text = page.extractText() # get text of current page
    
    bool1 = string_found(word1, text)
    bool2 = string_found(word2, text)
    
    if bool1 and bool2:
        print(i) # print number of page with both occurences