10

So my program is able to open PNGs but not PDFs, so I made this just to test, and it still isn't able to open even a simple PDF. And I don't know why.

from PIL import Image

with Image.open(r"Adams, K\a.pdf") as file:
    print file

Traceback (most recent call last):
  File "C:\Users\Hayden\Desktop\Scans\test4.py", line 3, in <module>
    with Image.open(r"Adams, K\a.pdf") as file:
  File "C:\Python27\lib\site-packages\PIL\Image.py", line 2590, in open
    % (filename if filename else fp))
IOError: cannot identify image file 'Adams, K\\a.pdf'

After trying PyPDF2 as suggested (Thanks for the link by the way), I am getting this error with my code. import PyPDF2

pdf_file= open(r"Adams, K (6).pdf", "rb")
read_pdf= PyPDF2.PdfFileReader(pdf_file)

number_of_pages = read_pdf.getNumPages()
print number_of_pages


Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
Hayden
  • 147
  • 1
  • 1
  • 10

2 Answers2

7

Following this article: https://www.geeksforgeeks.org/convert-pdf-to-image-using-python/ you can use the pdf2image package to convert the pdf to a PIL object.

This should solve your problem:

from pdf2image import convert_from_path

fname = r"Adams, K\a.pdf"
pil_image_lst = convert_from_path(fname) # This returns a list even for a 1 page pdf
pil_image = pil_image_lst[0]

I just tried this out with a one page pdf.

Alexander
  • 91
  • 1
  • 3
3

As pointed out by @Kevin (see comment below) PIL has support for writing pdfs but not reading them.

To read a pdf you will need some other library. You can look here which is a tutorial for handling PDFs with PyPDF2.

https://pythonhosted.org/PyPDF2/?utm_source=recordnotfound.com

Xantium
  • 11,201
  • 10
  • 62
  • 89
  • 7
    Surprisingly, Pillow can [write](https://pillow.readthedocs.io/en/5.1.x/handbook/image-file-formats.html#pdf) pdfs, it just can't read them. – Kevin Jun 26 '18 at 17:21
  • This seemed to be documented here. https://github.com/mstamy2/PyPDF2/issues/36 I don't know if that helps – Xantium Jun 26 '18 at 17:36
  • I read that but sadly its raising an exception not just giving a warning and the page didn't say how to actually combat that sadly. – Hayden Jun 26 '18 at 17:39
  • 1
    @Hayden Just checking to be absolutely sure, did you try `read_pdf= PyPDF2.PdfFileReader(pdf_file, strict=False)` – Xantium Jun 26 '18 at 17:47