2

I have a dataframe that includes a US Zip Code and a datetimeoffset field that is UTC time. I want to add a column to the dataframe that shows the local time based on the Zip. It looks like pyzipcode might have what I need but I can't figure out how to code it. Here is an example of the dataframe I have:

import pandas as pd
from pyzipcode import ZipCodeDatabase

zcdb = ZipCodeDatabase()

data = [{'Zip':78745, 'DateTimeOffsetUTC':'7/8/2020 5:17:48 PM +00:00'}]
df = pd.DataFrame(data)
df["LocalDatetime"] = ???

Thanks in advance! I think this might be a really simple thing but I'm really new to python.

cs95
  • 379,657
  • 97
  • 704
  • 746
jpc151
  • 54
  • 5
  • Do you want the tz info to be preserved for the local time at the zip? – cs95 Jul 09 '20 at 19:49
  • @cs95 So for the data example above, zip 78745 is in Central Daylight Time which is -5 UTC so in the ["LocalDateTime"] column I would want to save '7/8/2020 12:17:48 PM' – jpc151 Jul 09 '20 at 19:52
  • I'm not sure if you mean 12:17:48PM or 11:17:48AM. Check my answer below – cs95 Jul 09 '20 at 20:00
  • @cs95 First, thanks for your help. I've been spot-checking some of these and unless I'm missing something they always seem to be an hour off. In the example used it should be 12:17. Google shows it as Central Daylight Time with a -5 offset. I guess this could be a problem with tz_localize working with daylight savings... – jpc151 Jul 09 '20 at 20:19
  • I don't think so, the issue comes in the zipcode database, it returns an incorrect timezone offset that doesn't account for DST. Perhaps there is a pandas way but none I'm familiar with that involves zip codes. Sorry I'm unable to be more helpful! – cs95 Jul 09 '20 at 20:22
  • @cs95 That makes sense. Thanks again for your help. I'll accept your answer for now but I'm going to be on the lookout for a better zip code database solution. – jpc151 Jul 09 '20 at 20:25
  • Absolutely, was not expecting you to accept but appreciated nonetheless. Good luck! – cs95 Jul 09 '20 at 20:29
  • I'd recommend splitting this task up into three parts. 1) Use the zip code to get latitude and longitude. 2) Use the lat/lon to get a time zone identifier such as `America/Chicago`. [This list](https://stackoverflow.com/a/16086964/634824) shows two python options for that. 3) Use `dateutil` or `pytz` to apply the time zone to your date and time to get the correct offset including DST. – Matt Johnson-Pint Jul 10 '20 at 17:34

1 Answers1

1

Assuming you have no invalid data, it should suffice to map a lambda to each zip code to extract timezone offset.

dt = pd.to_datetime(df['DateTimeOffsetUTC'])
offset = df['Zip'].map(lambda z: pd.Timedelta(hours=zcdb[z].timezone))    
df['LocalDatetime'] = (dt + offset).dt.tz_localize(None)              
 
df
     Zip           DateTimeOffsetUTC       LocalDatetime
0  78745  7/8/2020 5:17:48 PM +00:00 2020-07-08 11:17:48
cs95
  • 379,657
  • 97
  • 704
  • 746