I have a dataframe with a multiIndex of state and town names. The columns are quarterly housing data that were created via PeriodIndex. I want to create a ratio of the data in a new column:
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
Whenever I try and create this new column, I get an error:
DateParseError: Unknown datetime string format, unable to parse: P Ratio
Full Code:
# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv' #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')
# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)
housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()
rec_start = '2000Q1'
rec_bottom = '2001Q1'
#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
housing_data_compact_df = housing_data_df[[start_col,end_col]]
#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
Here is some other data that may/may not be helpful:
[In]: print(housing_data_compact_df.head())
2000Q1 2001Q1
State RegionName
New York New York 503933.333333 465833.333333
California Los Angeles 502000.000000 413633.333333
Illinois Chicago 237966.666667 219633.333333
Pennsylvania Philadelphia 118233.333333 116166.666667
Arizona Phoenix 205300.000000 168200.000000
[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']
[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')
What I've Tried:
It seems that my issue has something to do with the PeriodIndex columns. I have tried converting the data via direct casts:
[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))
TypeError: cannot convert the series to <class 'float'>
I've also tried using .astype(), but I get the same error as without the conversion:
[In]: housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(start_col_name)].astype(float).div(housing_data_compact_df[pd.Period(end_col_name)].astype(float))
DateParseError: Unknown datetime string format, unable to parse: P Ratio
I have also reset the keys in an attempt to break the PeriodIndex, and then reindex after the operation is done. However, this does not seem to work on all the systems I test it on, and also seems like a roundabout way to fix what I believe should be a simple solution.
Question:
How can I create a new column as the ratio of the data from these PeriodIndex columns?
Thanks in advance for any help.