Pandas: convert a series which contains strings like "10%" and "0.10" into numeric

Question

What is the best way to convert a Pandas series that contains strings of the type "10%" and "0.10" into numeric values?

I know that if I have a series with just "0.10" type strings I can just do pd.to_numeric.

I also know that if I have a series of "10%" type strings I can do str.replace("%","") and then do pd.to_numeric and divide by 100.

The issue I have is for a series with a mix of "0.10" and "10%" type strings. How do I best convert this into a series with the correct numeric types.

I think I could do it by first making a temporary series with True / False depending on if the string has "%" in it or not and then based on that applying a function. But this seems inefficient.

Is there a better way?

What I Have Tried for Reference:

mixed = pd.Series(["10%","0.10","5.5%","0.02563"])
mixed.str.replace("%","").astype("float")/100

0    0.100000
1    0.001000
2    0.055000
3    0.000256
dtype: float64
# This doesn't work, because even the 0.10 and 0.02563 are divided by 100.

Rabinzel · Answer 1 · 2022-06-09T03:59:47.723

8

Somehow you need a condition. This is one possible way:

l = pd.Series((float(x.strip('%'))/100 if '%' in x else float(x) for x in mixed))
print(l)

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

edited Jun 09 '22 at 03:59

answered Jun 09 '22 at 03:49

Rabinzel

7,757
3
10
30

score 8 · Answer 2 · answered Jun 09 '22 at 03:55

A very neat solution based on this answer is:

from pandas import Series, to_numeric

mixed = Series(["10%", "0.10", "5.5%", "0.02563"])

print(to_numeric(mixed.str.replace("%", "e-2")))
# 0    0.10000
# 1    0.10000
# 2    0.05500
# 3    0.02563
# dtype: float64

SultanOrazbayev · Accepted Answer · 2022-06-10T07:10:46.337

The easiest solution is to select entries using a mask and handle them in bulk:

from pandas import Series, to_numeric

mixed = Series(["10%", "0.10", "5.5%", "0.02563"])

# make an empty series with similar shape and dtype float
converted = Series(index=mixed.index, dtype='float')

# use a mask to select specific entries
mask = mixed.str.contains("%")

converted.loc[mask] = to_numeric(mixed.loc[mask].str.replace("%", "")) / 100
converted.loc[~mask] = to_numeric(mixed.loc[~mask])

print(converted)
# 0    0.10000
# 1    0.10000
# 2    0.05500
# 3    0.02563
# dtype: float64

score 5 · Answer 4 · answered Jun 09 '22 at 05:57

5

mixed = mixed.apply(lambda x: float(x[:-1])/100 if '%' in x else float(x))

Output:

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

answered Jun 09 '22 at 05:57

BeRT2me

12,699
2
13
31

score 3 · Answer 5 · answered Jun 09 '22 at 12:21

3

Try:

mixed = pd.Series(["10%", "0.10", "5.5%", "0.02563"])


mixed = mixed.str.replace("%", "e-02")
print(pd.to_numeric(mixed))

Prints:

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

answered Jun 09 '22 at 12:21

Andrej Kesely

168,389
15
48
91

score 1 · Answer 6 · answered Jun 15 '22 at 13:44

You can use any of these

Try apply

mixed = pd.Series(["10%","0.10","5.5%","0.02563"])

def percent_to_float(x):
    if x.endswith("%"):
        x = x.rstrip("%")
        return float(x)/100
    else:
        return float(x)

cleaned = mixed.apply(lambda x : percent_to_float(x)) 

print(cleaned)

And also str.replace

mixed = pd.Series(["10%", "0.10", "5.5%", "0.02563"])
mixed = mixed.str.replace("%", "e-02")

print(pd.to_numeric(mixed))

And also you can use regex replace along with apply

import re

mixed = pd.Series(["10%","0.10","5.5%","0.02563"])

def percent_to_float(x):
    return float(re.sub( "%", "e-02", x))

cleaned = mixed.apply(lambda x : percent_to_float(x)) 

print(cleaned)

By any method you will get

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

Pandas: convert a series which contains strings like "10%" and "0.10" into numeric

6 Answers6

Linked