2

I have a factor vector. Some values can be repeated. The values are not known beforehand, but can be sorted. For example,

x1 <- factor(c("A", "C", "C", "A", "B" ), levels=c("A", "B", "C"))
x2 <- factor(c("E", "C", "C", "D", "B" ), levels=c("B", "C", "D", "E"))

I want to create another vector, in which each value is either "last", "other" or "first", and the values correspond to the first or last factor level. In the above case, the resulting vector y1 would have to be c("first", "last", "last", "first", "other"), while y2 would have to be c("last", "other", "other", "other", "first").

Currently, I do it like this:

f2l <- function(x) {
  x <- as.numeric(x)
  y <- rep("other", length(x))
  y[ x == max(x) ] <- "last"
  y[ x == min(x) ] <- "first"
  y
}

This works as intended, but I wonder whether there is a more efficient solution.

January
  • 16,320
  • 6
  • 52
  • 74
  • you may consider a sort of merge using `data.table`, [_a la_](http://stackoverflow.com/questions/28181753/grouping-factor-levels-in-an-r-data-table) – MichaelChirico Jul 28 '15 at 17:47

2 Answers2

3

You can reassign level labels using a list.

x1 <- factor(c("A", "C", "C", "A", "B" ), levels=c("A", "B", "C"))
x2 <- factor(c("E", "C", "C", "D", "B" ), levels=c("B", "C", "D", "E"))

f2l <- function(x){
  levels(x) <- list("first" = levels(x)[1],
                    "other" = levels(x)[-c(1, nlevels(x))],
                    "last" = levels(x)[nlevels(x)])
  x
}

f2l(x1)
f2l(x2)
Benjamin
  • 16,897
  • 6
  • 45
  • 65
1

Apart from Benjamin's method, if you are sure that the number of levels would be more than 2, you can use

f2l <- function(x){
    levels(x) <- c("first",rep("other",length(levels(x))-2),"last");
    x
}

If you are doing this for many factors then Benjamin's method is slow in comparison to the above method. The times for 100000 repetitions are

Benjamin
 user  system elapsed 
26.58    0.00   26.68 

Saksham
user  system elapsed 
17.15    0.08   18.30 
Saksham
  • 9,037
  • 7
  • 45
  • 73