I'm trying to find which shop has an 'empty' day, i.e. a day where no customer came.
My table has the following structure:
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
| shop | 2020-10-15 | 2020-10-16 | 2020-10-17 | 2020-10-18 | 2020-10-19 | 2020-10-20 | 2020-10-21 |
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
| Paris | 215 | 213 | 128 | 102 | 195 | 180 | 110 |
| London | 145 | 106 | 102 | 83 | 127 | 111 | 56 |
| Beijing | 179 | 245 | 134 | 136 | 207 | 183 | 136 |
| Sydney | 0 | 0 | 0 | 0 | 0 | 6 | 36 |
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
With pandas I can do something like customers[customers== 0].dropna(how="all")
and that will keep only the rows where there is a 0
, and I get this:
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
| shop | 2020-10-15 | 2020-10-16 | 2020-10-17 | 2020-10-18 | 2020-10-19 | 2020-10-20 | 2020-10-21 |
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
| Sydney | 0 | 0 | 0 | 0 | 0 | NaN | NaN|
+----------+-------------+-------------+-------------+-------------+-------------+-------------+------------+
In PySpark I believe .dropna()
does something similar but I want to do the opposite, keep the NA/0 values. How can I do that?