4

I'm trying to create a flag column based on other columns in a data frame.

example:

df <- tribble(
  ~x1, ~x2, ~x3, ~x4,
  1, 0, 1, 1,
  0, 0, NA, NA,
  1, 0, NA, 1,
  0, 0, NA, NA,
  0, 1, NA, 0
)

I want to create a flag column such that if the value 1 is present in any of the columns x1 ~ x4, then the value for the flag will be 1 and 0 otherwise.

res <- df |> mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))

I've tried using dplyr::if_any() with ifelse(), it seems to work for the most part, but for some reason it returns NA in the case of false.

> res
# A tibble: 5 × 5
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0    NA    NA    NA
3     1     0    NA     1     1
4     0     0    NA    NA    NA
5     0     1    NA     0     1

why is this happening? What would be a better solution to this?

edit: I tried to see what the if_any() function itself is returning and it seems like it returns NA instead of false.

> res
# A tibble: 5 × 6
     x1    x2    x3    x4  flag true_flase
  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>     
1     1     0     1     1     1 TRUE      
2     0     0    NA    NA    NA NA        
3     1     0    NA     1     1 TRUE      
4     0     0    NA    NA    NA NA        
5     0     1    NA     0     1 TRUE      
nightstand
  • 329
  • 2
  • 11

5 Answers5

2

Here is one way we could do it:

library(dplyr)
library(tidyr)

df %>% 
  rowwise %>%
  mutate(flag = any(cur_data() == 1),
         flag = replace_na(flag, 0))
 x1    x2    x3    x4 flag 
  <dbl> <dbl> <dbl> <dbl> <lgl>
1     1     0     1     1 TRUE 
2     0     0    NA    NA FALSE
3     1     0    NA     1 TRUE 
4     0     0    NA    NA FALSE
5     0     1    NA     0 TRUE 
TarJae
  • 72,363
  • 6
  • 19
  • 66
2

per https://stackoverflow.com/a/44411169/10276092

You can use %in% instead of == to sort-of ignore NAs.

df %>%  mutate(flag = ifelse(if_any(.cols=x1:x4, .fns= ~ . %in% 1), 1, 0))
M.Viking
  • 5,067
  • 4
  • 17
  • 33
1

Or just change NA's to 0

df %>% mutate_each(funs(replace(., which(is.na(.)), 0))) %>% mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))

Output:

     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0     0     0     0
3     1     0     0     1     1
4     0     0     0     0     0
5     0     1     0     0     1
KacZdr
  • 1,267
  • 3
  • 8
  • 23
1

Another option using rowSums

df %>% mutate(flag = +(rowSums(., na.rm = TRUE) > 0))

#----
# A tibble: 5 x 5
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <int>
1     1     0     1     1     1
2     0     0    NA    NA     0
3     1     0    NA     1     1
4     0     0    NA    NA     0
5     0     1    NA     0     1
nniloc
  • 4,128
  • 2
  • 11
  • 22
1

From R manual pages

Note:

Do not use ‘==’ and ‘!=’ for tests, such as in ‘if’ expressions, where you must get a single ‘TRUE’ or ‘FALSE’. Unless you are absolutely sure that nothing unusual can happen, you should use the ‘identical’ function instead.

Following the advice

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(flag = if_any(starts_with("x"), ~ identical(.x, 1)) * 1 )
# A tibble: 5 × 5
# Rowwise: 
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0    NA    NA     0
3     1     0    NA     1     1
4     0     0    NA    NA     0
5     0     1    NA     0     1
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29