28

I'm new to Google Cloud Platform.I have trained my model on datalab and saved the model folder on cloud storage in my bucket. I'm able to download the existing files in the bucket to my local machine by doing right-click on the file --> save as link. But when I try to download the folder by the same procedure as above, I'm not getting the folder but its image. Is there anyway I can download the whole folder and its contents as it is? Is there any gsutil command to copy folders from cloud storage to local directory?

JSnow
  • 929
  • 2
  • 11
  • 24

8 Answers8

43

You can find docs on the gsutil tool here and for your question more specifically here.

The command you want to use is:

gsutil cp -r gs://bucket/folder .
Matthias Baetens
  • 1,432
  • 11
  • 18
  • 1
    This is not really what I'm asking for. I have managed to copy my folder from Google Cloud Datalab to Cloud storage using gsutil command. My question is is there any way to download the folder to my local machine, so that I can use it offline? – JSnow Jun 08 '17 at 10:14
  • This command, when executed on your local command line, will do exactly that. The two options following the -r flag specify: 1. Your GCS path of the folder you want to download 2. The folder you want to download to (this will be your current folder in your command line session when using "**.**" but can be something like C:/Users/username/Documents or /home/username/ just as well – Matthias Baetens Jun 08 '17 at 10:20
  • Whenever I give the path to my local directory as destination , like C:/Users/username/Documents , it gives this error. "CommandException: Destination URL must name a directory, bucket, or bucketsubdirectory for the multiple source form of the cp command." – JSnow Jun 09 '17 at 07:13
  • `gsutil cp -r gs://api-project-921234036675cancer-data-7617/cancer_model7617 C:/Users/sanghamitra.rc` – JSnow Jun 09 '17 at 07:31
  • The second argument you are giving (C:/Users/sanghamitra.rc) looks like a file path rather than a folder path. Can you try: `C:/Users/` for example? – Matthias Baetens Jun 09 '17 at 08:56
  • That was a folder path. And its still showing the same error, no matter which folder path I try. Can you please tell me which OS you use? I think this problem might be specific to windows. – JSnow Jun 09 '17 at 09:21
  • Windows as well. Did you try with just `"."` as a second argument, while you are in the folder you want your GCS folder downloaded? Can you check the version of your gsutil? (There were some similar issues with previous versions apparently) – Matthias Baetens Jun 09 '17 at 09:27
  • Unfortunately, I am unable to replicate your error on my system. You might want to try doing `C:/Users/sanghamitra.rc` before you do `gsutil cp -r gs://api-project-921234036675cancer-data-7617/cancer_model76‌​17 C:/Users/sanghamitra.rc` to make 100% sure the folder you are trying to copy to exists? – Matthias Baetens Jun 09 '17 at 10:03
  • Make sure you don't miss out on the "." at end – UKDataGeek Mar 30 '19 at 10:59
  • 2
    I got the same error as @JSnow, I fixed it in my case. The reason is that the destination folder doesn't exist and I expect the command to create it, but it gives that error instead. So simply creating the directory fixed it for me. Hope this helps whoever is looking for the same answer. – HelloThere Jun 11 '19 at 04:51
  • For those wanting to avoid installing local tools like gsutil, there is a way to download files and folders from Google Cloud Storage entirely in the browser: https://stackoverflow.com/a/59567734/2441655 – Venryx Jan 02 '20 at 23:16
  • @Venryx That is removed now. I also don't get any errors when I use these commands to download an entire folder, but the folder just isn't there for some reason after I run the command. And I checked the directory i am trying to send it to, so that's not the issue. Very strange.. – wolfsatthedoor Sep 13 '21 at 20:27
14

This is how you can download a folder from Google Cloud Storage Bucket

Run the following commands to download it from the bucket storage to your Google Cloud Console local path

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}

once you run that command, confirm that your folder is on the localpath by running ls command to list files and directories on the localpath

Now zip your folder by running the command below

zip -r foldername.zp yourfolder/*

Once the zip process is done, click on the more dropdown menu at the right side of the Google Cloud Console,

Google Cloud Console Menu

then select "Download file" Option. You will be prompted to enter the name of the file that you want to download, enter the name of the zip file - "foldername.zp"

njmwas
  • 1,111
  • 14
  • 15
  • Yes, this works well for small files. I wrote about the same solution [here](https://stackoverflow.com/a/59567734/2441655), but the mods deleted that question and answer-set (I guess because SO is meant for programming rather than cloud-administration questions). Anyway, my linked answer gives some more details/options, if anyone has questions about this approach. – Venryx Sep 13 '21 at 20:37
12

Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init)

Command:

gsutil -m cp -r  gs://bucket-name .

This will copy all of the files using multithread which is faster. I found that the "dir" command instructed for use in the official Gsutil Docs did not work.

Digimix
  • 303
  • 3
  • 4
6

If you are downloading using data from google cloud storage using python and want to maintain same folder structure , follow this code i wrote in python.

OPTION 1

from google.cloud import storage

def findOccurrences(s, ch): # to find position of '/' in blob path ,used to create folders in local storage
    return [i for i, letter in enumerate(s) if letter == ch]

def download_from_bucket(bucket_name, blob_path, local_path):    
    # Create this folder locally
    if not os.path.exists(local_path):
        os.makedirs(local_path)        

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blobs=list(bucket.list_blobs(prefix=blob_path))

    startloc = 0
    for blob in blobs:
        startloc = 0
        folderloc = findOccurrences(blob.name.replace(blob_path, ''), '/') 
        if(not blob.name.endswith("/")):
            if(blob.name.replace(blob_path, '').find("/") == -1):
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')
                logging.info(downloadpath)
                blob.download_to_filename(downloadpath)
            else:
                for folder in folderloc:
                    
                    if not os.path.exists(local_path + '/' + blob.name.replace(blob_path, '')[startloc:folder]):
                        create_folder=local_path + '/' +blob.name.replace(blob_path, '')[0:startloc]+ '/' +blob.name.replace(blob_path, '')[startloc:folder]
                        startloc = folder + 1
                        os.makedirs(create_folder)
                    
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')

                blob.download_to_filename(downloadpath)
                logging.info(blob.name.replace(blob_path, '')[0:blob.name.replace(blob_path, '').find("/")])

    logging.info('Blob {} downloaded to {}.'.format(blob_path, local_path))


bucket_name = 'google-cloud-storage-bucket-name' # do not use gs://
blob_path = 'training/data' # blob path in bucket where data is stored 
local_dir = 'local-folder name' #trainingData folder in local
download_from_bucket(bucket_name, blob_path, local_dir)

OPTION 2: using gsutil sdk One more option of doing it via python program is below.

def download_bucket_objects(bucket_name, blob_path, local_path):
    # blob path is bucket folder name
    command = "gsutil cp -r gs://{bucketname}/{blobpath} {localpath}".format(bucketname = bucket_name, blobpath = blob_path, localpath = local_path)
    os.system(command)
    return command

OPTION 3 - No python ,directly using terminal and google SDK Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init) Refer to below link for commands:

https://cloud.google.com/storage/docs/gsutil/commands/cp

Tokci
  • 1,220
  • 1
  • 23
  • 31
2

gsutil -m cp -r gs://bucket-name "{path to local existing folder}"

Works for sure.

Pratap Singh
  • 401
  • 1
  • 4
  • 14
2

As of Mar. 2022, the gs path needs to be double quoted. You can actually find the proper downloading command by navigating to the bucket root, check one of the dir and click Download on the top.

Yi Han
  • 21
  • 2
0

Here's the code I wrote. This Will download the complete directory structure to your VM/local storage .

from google.cloud import storage
import os
bucket_name = "ar-data"
    
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)

dirName='Data_03_09/' #***folder in bucket whose content you want to download
blobs = bucket.list_blobs(prefix = dirName)#, delimiter = '/')
destpath=r'/home/jupyter/DATA_test/' #***path on your vm/local where you want to download the bucket directory
for blob in blobs:
    #print(blob.name.lstrip(dirName).split('/'))
    currpath=destpath
    if not os.path.exists(os.path.join(destpath,'/'.join(blob.name.lstrip(dirName)).split('/')[:-1])):
        for n in blob.name.lstrip(dirName).split('/')[:-1]:
            currpath=os.path.join(currpath,n)
            if not os.path.exists(currpath):
                print('creating directory- ', n , 'On path-', currpath)
                os.mkdir(currpath)
    print("downloading ... ",blob.name.lstrip(dirName))
    blob.download_to_filename(os.path.join(destpath,blob.name.lstrip(dirName)))

or simply use in terminal :

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}
0

local-gcs: !gsutil -m cp -r my_folder gs://[BUCKET_NAME]/[DESTINATION_FOLDER] gcs-local: gsutil -m cp -r gs://your-bucket-name/your-folder-path local-folder-path