1

I have a pandas DataFrame with a text field called "term", that has string values ' 36 months' and ' 60 months' (both with leading spaces). I'm looking to replace the strings with just the numeric piece, i.e. 36 or 60.

I'm trying to follow this post -

Updating a pandas column with a dictionary lookup

Here's my code -

import pandas as pd
import re

loan_data = pd.read_csv('loan_data_2007_2014.csv')

# create a dictionary with the from (key) and to (value) pairs for the lookup replacement
term_set = set(loan_data['term'])
term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}

# term_dict => {' 36 months': 36, ' 60 months': 60} (both keys have a leading space)

loan_data['term_int'] = loan_data['term']

loan_data['term_int'].update(pd.Series(term_dict))

print("unique term_int = ", loan_data['term_int'].unique())

I expected the term_int field in the DataFrame to contain values 36 or 60, but the lookup didn't work.

What did I do wrong?

JohnSmith
  • 13
  • 4

2 Answers2

0

What stands out to me is the line: loan_data['term_int'].update(pd.Series(term_dict)). I don't think DataFrame.update() is the correct approach. I would just apply a function to column "term" to generate a new column.

kmklim
  • 28
  • 3
0

Use map instead of update.

import pandas as pd
import re

term_set = set(loan_data['term'])
term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}

loan_data['term_int'] = loan_data['term'].map(term_dict)

Eyad Sibai
  • 811
  • 7
  • 21