Ignore missing file while downloading with Python ftplib

Question

I am trying to download a certain file (named 010010-99999-year.gz) from an FTP server. This same file, but for different years is residing in different FTP directories. For instance:

ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2000/010010-99999-1973.gz ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2001/010010-99999-1974.gz and so on. The picture illustrates one of the directories: enter image description here

The file is not located in all the directories (i.e. all years). In such case I want the script to ignore that missing files, print "not available", and continue with the next directory (i.e. next year). I could do this using the NLST listing by first generating a list of files in the current FTP directory and then checking if my file is on that list, but that is slow, and NOAA (the organization owning the server) does not like file listing (source). Therefore I came up with this code:

def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
    ftp=FTP(url)        
    ftp.login()
    for year in range(1901,2015):
        ftp.cwd("/pub/data/noaa/isd-lite")
        ftp.cwd(str(year))
        fullStationId="010010-99999-%s.gz" % year
        try:              
            file=open(fullStationId,"wb")
            ftp.retrbinary('RETR %s' % fullStationId, file.write)
            print("File is available")
            file.close()
        except: 
            print("File not available")
    ftp.close()

This downloads the existing files (year 1973-2014) correctly, but it is also generating empty files for years 1901-1972. The file is not in the FTP for 1901-1972. Am I doing anything wrong in the use of try and except, or is it some other issue?

possible duplicate of [Check if a file exists using Python](http://stackoverflow.com/questions/82831/check-if-a-file-exists-using-python) — Nir Alfasi, Feb 07 '15 at 19:02
@alfasin the possible duplicate question is about checking if a file exists locally. My question is about continuing the loop if a file does not exist in an FTP server. — multigoodverse, Feb 07 '15 at 19:10
Sorry, my bad. You should check the file-size on the FTP server, if the size > 0 it exists. Example: http://www.example-code.com/python/ftp_fileExists.asp — Nir Alfasi, Feb 07 '15 at 19:14

Hai Vu · Accepted Answer · 2015-02-09T14:11:39.807

I took your code and modified it a little:

from ftplib import FTP, error_perm
import os

def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
    ftp = FTP(url)
    ftp.login()
    for year in range(1901, 2015):
        remote_file = '/pub/data/noaa/isd-lite/{0}/010010-99999-{0}.gz'.format(year)
        local_file = os.path.basename(remote_file)
        try:
            with open(local_file, "wb") as file_handle:
                ftp.retrbinary('RETR %s' % remote_file, file_handle.write)
            print('OK', local_file)
        except error_perm:
            print('ERR', local_file)
            os.unlink(local_file)
    ftp.close()

Notes

The most dangerous operation a person can do is to have an except clause without a specific exception class. This type of construct will ignore all errors, making it hard to troubleshoot. To fix this, I added the specific exception error_perm
Once the exception occurred, I absolutely know for sure that the local file is closed because the with statement guarantees that
I removed the local file if error_perm exception occurred, a sign that the file is not available from the server
I removed the code to change directories: for each year, you cwd twice which slows down the process
range(1901, 2015) will not include 2015. If you want it, you have to specify range(1901, 2016)
I improved the print statements to include the file names, making it easier to track which ones are available and which ones are not

Update

This update answers your question regarding not creating empty local file (then having to delete them). There are a couple of different ways:

Query the remote file's existence before downloading. Only create the local file when the remote exists. The problem with this approach is querying a remote file takes longer than creating/deleting a local file.
Create a string buffer (StringIO), download to that buffer. Only create a local file when that string buffer is not empty. The problem with this approach is you are writing the same data twice: once to the string buffer, and once from the string buffer to the file.

Great! Great notes as well! Do you think it is possible to not generate the empty files locally at all? — multigoodverse, Feb 07 '15 at 20:26

score 1 · Answer 2 · edited May 23 '17 at 12:13

1

I think the problem is within your try: except block, where you keep a file handler open for a new file before checking if the file exists or not:

try:              
    file=open(fullStationId,"wb")
    ftp.retrbinary('RETR %s' % fullStationId, file.write)
    print("File is available")
    file.close()
except: 
    print("File not available")

Instead, add an additional statement in the except block to close the file handler, and another statement to remove the file if it is empty.

Another possibility is to open the file for writing locally only if the file exists and has a non zero size on the server using ftp.size

edited May 23 '17 at 12:13

Community

1
1

answered Feb 07 '15 at 19:16

Anshul Goyal

73,278
37
149
186

+1 for that because it works. It would be best to not generate empty files at all. Like: try: ftp.retrbinary('RETR %s' % fullStationId,open(fullStationId,"wb").write.close()) But this is not working either. – multigoodverse Feb 07 '15 at 20:03

Ignore missing file while downloading with Python ftplib

2 Answers2

Notes

Update

Linked