0

Lets say I have the following dataframe:

ID  |  has_id_dummy
-----------------------
 340         NaN
 NaN         NaN
 NaN         NaN
 200         NaN

And I want to turn it into this DataFrame:

     ID  |  has_id_dummy
    -----------------------
     340         1
     NaN         0
     NaN         0
     200         1

To do this I came up with the following function:

def dummypopulator(x):
    if x != np.nan:
        return 1
    return 0

which I call with the following line

df['has_id_dummy'] = df['ID'].apply(dummypopulator)

But the value is then set to 1 for all rows, even the rows that dont have an ID and should be 0.

     ID  |  has_id_dummy
    -----------------------
     340         1
     NaN         1
     NaN         1
     200         1

I tried calling the function with a seperate lambda as I saw in an example:

df['has_id_dummy'] = df['ID'].apply(lambda x: dummypopulator(x))

Yet the result is the same.

I feel like I am missing a very obvious error, but for the life of me cant figure out why it wont work. Does anyone know what I am doing wrong?

Jasper
  • 2,131
  • 6
  • 29
  • 61
  • 2
    you cannot do an equality comparison with an np.nan value! np.nan is not equal to anything, including itself. Use np.isnan instead or as suggested in one of the answers, just use the pandas function .notnull() for the entire column! – tobsecret May 21 '18 at 22:45

2 Answers2

2

The value nan is not a number and cannot be compared to other numbers. In particular, nan==nan is not True (neither is nan!=nan).

In your case, the use of apply is not even necessary. Just do df['has_id_dummy'] = df['ID'].notnull().astype(int).

DYZ
  • 55,249
  • 10
  • 64
  • 93
2
def dummypopulator(x):
    if ~np.isnan(x):
        return 1
    else :
        return 0
df['ID'].apply(dummypopulator)
Out[256]: 
0    1
1    0
2    0
3    1
Name: ID, dtype: int64

reason :

np.nan!=np.nan
Out[257]: True

My way for this question

df['ID'].notnull().astype(int)
BENY
  • 317,841
  • 20
  • 164
  • 234