8

I am fairly new to Python so forgive me this simple question. I'm trying to convert string to float. Here is a sample of the data:

0     10.65%
1      7.90%

When I try:

 df['int_rate'] = df['int_rate'].astype('float')

I get:

ValueError: could not convert string to float: '13.75%'

When I try:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

And check my data, I get:

0     None
1     None

Any ideas what I'm doing wrong? Many thanks!

Guillaume
  • 5,497
  • 3
  • 24
  • 42
Minsky
  • 473
  • 2
  • 7
  • 18

3 Answers3

6

You can use Series.replace with parameter regex=True for replace substrings:

df = pd.DataFrame({'int_rate':['10.65%','7.90%']})
df['int_rate'] = df['int_rate'].replace("%","", regex=True).astype(float)
print (df)
   int_rate
0     10.65
1      7.90

Or Series.str.replace:

df['int_rate'] = df['int_rate'].str.replace("%","")
print (df)
  int_rate
0    10.65
1     7.90
2         

Or Series.str.rstrip:

df['int_rate'] = df['int_rate'].str.rstrip("%").astype(float)
print (df)
   int_rate
0     10.65
1      7.90

See difference without it:

df = pd.DataFrame({'int_rate':['10.65%','7.90%', '%']})

df['int_rate_subs'] = df['int_rate'].replace("%","", regex=True)
df['int_rate_val'] = df['int_rate'].replace("%","")
print (df)
  int_rate int_rate_subs int_rate_val
0   10.65%         10.65       10.65%
1    7.90%          7.90        7.90%
2        %                           
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Yes, I also found that setting `regex=True` makes it work, however, can you explain why? At first I thought the problem to be that the `%` needs to be escaped with `%%` (as requred by old style string interpolation), but that makes no difference. I don't get why treating `%` as a regex fixes the problem. – mhawke Sep 20 '17 at 13:05
  • Thanks so much, Jezrael and others for chipping in. Regex=True did the trick. As mhawke, I would also be curious as to why? Many thanks! – Minsky Sep 20 '17 at 13:11
  • 1
    Really hard get nice answer, but `replace` looks for exact matches unless you pass a regex pattern and param `regex=True` – jezrael Sep 20 '17 at 13:17
  • 1
    Thanks Jezrael, that makes sense. – Minsky Sep 20 '17 at 13:24
5

As you guessed, ValueError: could not convert string to float: '13.75%' indicates that the % character blocks the convertion.

Now when you try to remove it:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

You set inplace=True in your replacement, which as the name suggests changes the dataframe in-place, so replace() method call returns None. Thus you store None in df['int_rate'] and end up with a column containing only None values. You should either do:

df['int_rate'] = df['int_rate'].replace("%","") 

or

df['int_rate'].replace("%","", inplace=True)
Guillaume
  • 5,497
  • 3
  • 24
  • 42
  • Thank you, Guillaume. That makes sense. However, for some bizarre reason, the % sign is not coming off with either method. The rows remain unchanged. Any thoughts? – Minsky Sep 20 '17 at 12:49
2

Since you're using a string, you could convert the value to a float using

float(df['int_rate'][:-1])

This reads the string from the first position to the second to last position, 10.65 instead of 10.65%.

Tyler
  • 129
  • 9
  • Thanks Tyler. I get TypeError: cannot convert the series to – Minsky Sep 20 '17 at 12:54
  • No problem, Minsky, what version of Python are you using? I'll take a closer look at the issue to see if I can replicate the error you're receiving. I should add that I'm using 3.6.1 above. – Tyler Sep 20 '17 at 12:58
  • Appreciate it. I'm on Python 3.6. I'm practicing on the Lending Club data-set (directly from their website). – Minsky Sep 20 '17 at 12:59
  • Just to clarify, I'm using the Anaconda version of 3.6. The complete error message is Traceback (most recent call last): File "C:\Users\avesu\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in float(df['int_rate'][:-1]) File "C:\Users\avesu\Anaconda3\lib\site-packages\pandas\core\series.py", line 97, in wrapper "{0}".format(str(converter))) TypeError: cannot convert the series to – Minsky Sep 20 '17 at 13:03