1

I am trying to find certain words in a pdf file using Aspose.pdf and regular expressions. The code is running without errors but never return TRUE.

Public Shared Function FindInPDF(sourcePdf As String, searchPhrase As String) As Boolean

        Try
            ' Open document
            Dim pdfDocument = New Document(sourcePdf)

            '   "D[a-z]{7}"
            ' Create TextAbsorber object to find all the phrases matching the regular expression
            Dim absorber As Aspose.Pdf.Text.TextFragmentAbsorber = New Aspose.Pdf.Text.TextFragmentAbsorber(searchPhrase) With {
                .TextSearchOptions = New TextSearchOptions(True)
            }

            ' Accept the absorber for all the pages
            pdfDocument.Pages.Accept(absorber)

            ' Loop through the fragments
            For Each textFragment As Aspose.Pdf.Text.TextFragment In absorber.TextFragments
                Console.WriteLine("Text : {0} ", textFragment.Text)
                FindInPDF = True
            Next

        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return FindInPDF
    End Function

Is there an error in my code?

The regular expressions string is inserted in the function through searchPhrase

Marco
  • 85
  • 1
  • 14

1 Answers1

0

Instead of using Aspose.pdf, which is a paid library, I switched to iTextSharp. It has the same functionality.

Public Shared Function GetTextFromPDF2(ByVal PdfFileName As String, searchPhrase As String) As Boolean

        Try
            Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
            Dim sOut = ""
            For i = 1 To oReader.NumberOfPages
                Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
                sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
                Dim adrRx As Regex = New Regex(searchPhrase)
                Dim keyword As New List(Of String)
                For Each item As Match In adrRx.Matches(sOut.ToLower)
                    keyword.Add(item.Value)
                    GetTextFromPDF2 = True
                Next

            Next
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return GetTextFromPDF2
    End Function
Marco
  • 85
  • 1
  • 14