0

Dataset containing some null values

I want to create a new boolean column that has a 0/1 depending if the ts_booking_at is null or not.

Currently I'm using the following code:

contacts.iloc[contacts["ts_accepted_at"] is np.nan, "accept"] = False
contacts.iloc[contacts["ts_accepted_at"] is not np.nan, "accept"] = True

But the following error returns

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

  • A few issues. You need to use `loc` not `iloc` to select the column by label more information [here](https://stackoverflow.com/a/56231541/15497888). You can't use `is` to test if something is NaN in pandas since `is` is going to assess if the _Series_ itself (`contacts["ts_accepted_at"]`) is `np.nan` not _each_ value in the Series more details [here](https://stackoverflow.com/a/68697141/15497888). To check for NaN you can use `isna`/`isnull` as outlined [here](https://stackoverflow.com/a/67708569/15497888) or the inverse `notna`/`notnull` – Henry Ecker Feb 17 '22 at 01:51
  • So `contacts.loc[contacts["ts_accepted_at"].isna(), "accept"] = False` and `contacts.loc[contacts["ts_accepted_at"].notna(), "accept"] = True` – Henry Ecker Feb 17 '22 at 01:53
  • Or even more simply by just assigning the result of the boolean index to the column: `contacts['accept'] = contacts["ts_accepted_at"].notna()` – Henry Ecker Feb 17 '22 at 01:53

0 Answers0