3

I am trying to batch geocode a group of addresses through the US Census Geocoder: http://geocoding.geo.census.gov/geocoder/

I have found this question:

Posting to and Receiving data from API using httr in R

and Hadley's suggestion works perfectly to send my data frame to the API and get the geocoded addresses back. The problem I am running in to is how to get the returned data back in to a data frame. I would've commented on his response there, but unfortunately since this is a new account I am not able to comment yet.

So my code is as follows:

req <- POST("http://geocoding.geo.census.gov/geocoder/geographies/addressbatch", 
body = list(
      addressFile = upload_file("mydata.csv"),
      benchmark = "Public_AR_Census2010",
      vintage = "Census2010_Census2010"
 ), 
encode = "multipart",
verbose())
stop_for_status(req)
content(req)

When I run content(req), I get data that looks like this:

"946\",\"123 MY STREET, ANYTOWN, TX,
99999\",\"Match\",\"Non_Exact\",\"123 MY STREET, ANYTOWN, TX,
99999\",\"-75.43486,80.423775\",\"95495654\",\"L\",\"99\",\"999\",\"021999\",\"3
005\"\n\"333\",\"456 MY STREET, ANYTOWN, TX, 
99999\",\"Match\",\"Exact\",\"456 MY STREET, ANYTOWN, TX, 
99999\",\"-75.38545,80.383747\",\"6546542\",\"R\",\"99\",\"999\",\"021999\",\"3002\"\n\

I've tried using the jsonlite approach mentioned here: Successfully coercing paginated JSON object to R dataframe

as well as googling httr/content to data frame, and haven't had any luck. The closest I have come to getting what I want is using

cat(content(req, "text"), "\n") which gets results that look like a CSV I could use as a data frame:

"476","123 MY STREET, ANYTOWN, TX, 99999","Match","Exact",
"123 MY STREET, ANYTOWN, TX,
99999","-75.438644,80.426025","654651321","L","99","999","0219999","3013"

But I was also unable to find any help on getting the results of a cat() into a data frame as I believe the function only prints the results.

When I use a browser and upload a csv I get a csv back that has the following columns: RowID, Address, Match, MatchType, MatchedAddress, Lat, Long, StreetSide, State, County, Tract, Block

I would prefer to do this all through R, so my end result needs to be a data frame with those columns. The data is there in the content(req), I just haven't figured out how to get it in a data frame.

Thanks for the help!

Community
  • 1
  • 1
tds7
  • 31
  • 1
  • 3

2 Answers2

6

Use textConnection to make it one liner

df <- read.csv(textConnection(content(req, 'text')))
YH Wu
  • 465
  • 6
  • 6
  • There is also a `text=` property now for `read.csv` not so you can do `df <- read.csv(text=content(req, 'text'))` and avoid having to explicitly create the `textConnection()`. – MrFlick Nov 24 '21 at 19:24
2

Perhaps now over 6 months later, this question has been resolved. But in case others have the same issue:

The problem is that you are missing a column header in your list of variables and you have two column headers for coordinates. And you can't use the ones provided by the Census Bureau, because they do not provide a complete header row for all variables. First send the output to a CSV file:

cat(content(req, "text"), file="reqoutput.csv")

Then read it back in as a dataframe, providing your own header row:

reqdata<-read.csv(file="reqoutput.csv", skip=1,
                  col.names = c('RowID', 'Address', 'Match', 'MatchType',
                                'MatchedAddress', 'LongLat', 'thing',
                                'Streetside', 'State', 'County', 'Tract',                
                                'Block'))

In your example output, note that the Census bureau provides coordinates as one field in double-quotes, and it's Longitude followed by Latitude.

After coordinates, there is a nine digit string of numbers, I don't know what that is. I called it 'thing'.

Kilburn
  • 21
  • 3