1

I am trying to scrape tables from pdf with read_pdf in python. I am using read_pdf but it doesn't do the job. Also, to mention, I do this in MAC with Jupiter notebook. This is what I do:

from tabula import read_pdf
file = read_pdf(r'C:\Users\myname\Rprojects\Reports_scraping\data_scraped\icnarc_29052020\icnarc_200529.pdf')

I get this error:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\myname\\Rprojects\\Reports_scraping\\data_scraped\\icnarc_29052020\\icnarc_200529.pdf'

How I can solve this issue?

GaB
  • 1,076
  • 2
  • 16
  • 29
  • Are you sure the file is there? – formicaman May 30 '20 at 23:55
  • very sure, I just simply see it. I know it is misleading since I put it in Rprojects but it is there – GaB May 30 '20 at 23:56
  • I have changed it and out it into desktop and the same error: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\myname \\Desktop\\icnarc_200529.pdf' – GaB May 31 '20 at 00:00

2 Answers2

2

just to check that the file exist, do you get True when running this:

import os


file_path = r'C:\Users\myname\Rprojects\Reports_scraping\data_scraped\icnarc_29052020\icnarc_200529.pdf'
print( os.path.isfile(file_path))

Edit file_path with wherever is the file(using Python 3). And did you change "myname" in the path with your actual username... (just in case)

It is preferable to build your paths using os.path.join to make things compatible, on windows it will need to create a root "config.py" file, see

how to get the root folder on windows

#

having discussed with GaB, it seemed that he is using Jupyter notebook on Mac, which explains issues, I saw this link, but can't help more.

Jupyter - import pdf

os.path.join doc

Je Je
  • 508
  • 2
  • 8
  • 23
  • it shouldn't show "0", it should show True or False. maybe send us a screen copy... – Je Je May 31 '20 at 00:15
  • so, it does give me a FALSE. Yet, I am just literally went 10.000 times over it and it is the actual path. I do not know why that is the case – GaB May 31 '20 at 00:17
  • can you send a screen copy of the path in Windows explorer, changing your name? as well what size if the file showing in windows explorer? – Je Je May 31 '20 at 00:23
  • I am actually on my MAC and cannot find th screenshots. sooo ridiculous of me ... – GaB May 31 '20 at 00:25
  • so on a MAc this way to access a file will not work as you are giving an abslute path from windows starting from C:\. Probably is the issue. On a Mac use relative path rather than absolute. – Je Je May 31 '20 at 00:26
  • do you mean this is a MAC issue, that is why I cannot read it? I am doing this on Jupiter notebook – GaB May 31 '20 at 00:29
  • not really a Mac issue, just that the file isn't located in the same absolute path in a Mac or a Windows PC. I guess you github your project onto your Mac and it isn't working? or copy paste? – Je Je May 31 '20 at 00:32
  • what is the relative path then? I do not use github – GaB May 31 '20 at 00:32
  • relative path would start from your root project folder on your Mac. then I don't know where you have the actual pdf saved on your Mac. So to make it simple, copy and paste your pdf in your root project folder (the project you created on your Mac), and then just call the file with its name not its path:'icnarc_200529.pdf' – Je Je May 31 '20 at 00:35
  • I am not sure how you checked "10000 times" the file was there if you are operating from a Mac and checked a windows path.... – Je Je May 31 '20 at 01:01
  • how come you say, you checked 10000 times the file was located on the path you provided (which is windows) when you say you are using Mac. I am not sure i need to edit more at this point. Did you copy and paste the file in root folder? – Je Je May 31 '20 at 01:15
  • I believe there is a missunderstanding. I thought \\ this is for MAC as well. How shall I write it, then for MAC? – GaB May 31 '20 at 01:18
  • no "\\" this is for windows but not relevant, you could use "/" (mac/linux) it still wouldn't work, because your file is not on this C:\...etc... location when you are on your Mac. That is why if I asked you if you had opened this file. Are you VNCing into a PC, I mean there are a lot of potential answers. So I asked you to check with the "os" library that it was finding the file, and it doesn't, can you open the file on your Mac? – Je Je May 31 '20 at 01:23
  • I can open the file on my MAC – GaB May 31 '20 at 01:25
  • okay - here it is saved - Macintosh HD⁩ ▸ ⁨Users⁩ ▸ ⁨myname⁩ ▸ ⁨Rprojects⁩ ▸ ⁨Reports_scraping⁩ ▸ ⁨data_scraped⁩ ▸ ⁨icnarc_29052020⁩ - the file is in icnarc_29052020 – GaB May 31 '20 at 01:29
  • [I 02:31:22.688 NotebookApp] Saving file at /Rprojects/Reports_scraping/data_scraped/icnarc_29052020/Untitled.ipynb – GaB May 31 '20 at 01:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215006/discussion-between-gab-and-nono-london). – GaB May 31 '20 at 01:37
1

There can be only one possibility, the file is not there, but you have already checked that I assume, if not, Once again check whether the spelling of file is correct. If this doesn't work, then do below trick

Execute the py code in same folder as file, and then use

from tabula import read_pdf
file = read_pdf(r'icnarc_200529.pdf')

Sometimes, this simple method does the trick.