13

This old answer points to a link on Google App Engine documentation, but that link is now about backup your GAE data, not downloading it.

So how to download all the data into a csv? The data is small, i.e < 1 GB

Community
  • 1
  • 1
Heisenberg
  • 8,386
  • 12
  • 53
  • 102
  • you can download this backups. it's not in csv format, though, but you could probably convert it – Igor Artamonov Oct 27 '15 at 05:36
  • Converting it is actually my problem right now. Also, I can download this back up with `gsutil`, right? It seems like it doesn't download everything. – Heisenberg Oct 27 '15 at 05:39
  • yes, "probably convert" :) it's leveldb dump, so it's little bit complicated. i've never tried gsutil downloads, not sure how it works. – Igor Artamonov Oct 27 '15 at 05:45
  • I think [this guide](http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/) solves how to export to `.csv`. Still, the solution requires me to backup the data from GAE to Google Cloud Storage first (not able to automate?), then download, then convert. That seems like an awful lot of step for something so simple? – Heisenberg Oct 27 '15 at 06:03

2 Answers2

30

I had tried a couple different approaches to export to csv using steps outlined here and here. But I could not get either to work. So, here is what I did (my largest table was about 2GB). This works relatively quickly even though it seems like a lot of steps...better than fighting random code that Google may have changed for hours on end, too:

  1. Go into Cloud Storage and create 2 new buckets "data_backup" and "data_export". You can skip this if you already have a bucket to store things in.
  2. Go into "My Console" > Google Datastore > Admin > Open Datastore Admin for the datastore you are trying to convert.
  3. Check off the entity or entities that you want to backup and click "Backup Entities". I did one at a time since I only had like 5 tables to export rather than checking off all 5 at once.
  4. Indicate the Google Storage (gs) bucket you want to store them in
  5. Now go to Google Big Query (I had never used this before but it was cake to get going)
  6. Click the little down arrow and select "Create a New Dataset" and give it a name.
  7. Then click the down arrow next to the new dataset you just created and select "Create New Table". Walk through the steps to import selecting "Cloud Datastore Backup" under the Select Data step. Then choose whichever backup that you want to import to Big Query so you can export it to csv in the next step.
  8. Once the table imports (which was pretty quick for mine), click the down arrow next to the table name and select "Export". You can export directly to csv and you can save to the google storage bucket you created for the export and then download from there.

Here's a few tips:

  • If your data has nested relationships, you will have to export to JSON rather than CSV (they also offer avro format whatever that is)
  • I used json2csv to convert my exported JSON files that could not be saved as csv. It runs a little slow on big tables but gets it done.
  • I had to split the 2GB file into 2 files because of a python memory error in json2csv. I used gsplit to split the files and checked off the option under Other Properties > Tags & Headers > Do not add Gsplit tags...(this made sure Gsplit did not add any data to the split files)

Like I said, this was actually pretty quick even though it is a number of steps. Hope it helps someone avoid a bunch of time spent trying to convert strange backup file formats or run code that may not work anymore.

Community
  • 1
  • 1
Scott
  • 526
  • 6
  • 10
  • 8
    This is neat, thanks Scott. I would only add that, when at step 7, you should provide the file with the extension .backup_info corresponding to the Entity you're about to load, which file is stored in the bucket among the Entities. – m.piras Mar 07 '16 at 10:35
  • Thank you. I have tried your steps and it works. I am just confused about how many times do I need to do this. Do I need to do it for each table? or is there a way to export everything in one go? Thanks – filias Jul 19 '16 at 06:31
  • I did it for each table @filias. I only had 5 or 6 really large tables. If it works for you, it would be great if you could mark it as the correct answer. Thanks! – Scott Jul 19 '16 at 13:51
  • Thanks for your answer. Actually I have done it but the json that came out of one of the tables was invalid. I then tried using the appcfg.py download_data command with success and am continuing with that. – filias Jul 20 '16 at 09:00
  • 4
    That worked great for me! however had some gotchas in the process: **(1)** in step 7, the location field should be for a `backup_info` file (e.g. `gs:////ag5...Qw..backup_info`) **(2)** if you table is large (mine was 20gb) it won't let you export to a single file, you'll need to specify an export uri like: `gs:////*` and it'll export to multiple files in that folder. **(3)** if your schema isn't completely flat, it won't let you export to `csv`, only `json`. – marmor Dec 04 '16 at 15:43
  • You are awesome! I've been meaning to export all the data I've been collecting in a blob for years! I was finally able to get it out today thanks to you. I had like half a million entries :) – Bilbo Baggins Nov 30 '17 at 04:23
  • This works also in 2020. To import, there is a metadata file at the top of the bucket -- don't use that. Navigate down, until you get to the metadata file in the same directory (on GCS) that contains the "file chunks". Also, remember to set the import type to "google cloud datastore backup". – Luca Sep 29 '20 at 01:12
3

You can use appcfg.py to download Kind data in csv format.

$ appcfg.py download_data --help

Usage: appcfg.py [options] download_data

Download entities from datastore.

The 'download_data' command downloads datastore entities and writes them to file as CSV or developer defined format.

Josh J
  • 6,813
  • 3
  • 25
  • 47