12

Stupid example:

df <- data.frame(group=rep(LETTERS, each=2), value=1:52)
res <- unlist(lapply(unique(df$group), function(x) mean(subset(df, group==x)$value)))
names(res) <- unique(df$group)

Will res always be?

   A    B    C    D    E    F    G    H    I    J    K    L    M    N    O    P 
 1.5  3.5  5.5  7.5  9.5 11.5 13.5 15.5 17.5 19.5 21.5 23.5 25.5 27.5 29.5 31.5 
   Q    R    S    T    U    V    W    X    Y    Z 
33.5 35.5 37.5 39.5 41.5 43.5 45.5 47.5 49.5 51.5 

Or will it ever happen that the means calculated on line 2 won't match up to the names on line 3? I guess it depends on the underlying implementation of unique in the R base, but I'm not sure where to find that.

fanli
  • 1,069
  • 7
  • 13
  • 6
    I believe it returns them in the order they appear in the original vector, but the documentation doesn't _explicitly_ promise this (though it sort of hints at it) so if you're willing to assume a small amount of risk you could assume that I suppose. – joran Apr 04 '16 at 21:55

2 Answers2

17

According to ?unique:

‘unique’ returns a vector, data frame or array like ‘x’ but with duplicate elements/rows removed.

This description gives you a complete description of the ordering -- it will be in the same order as the order of the first unique elements. (I guess I don't see the wiggle room that @joran sees for a different ordering.) For example,

unique(c("B","B","A","C","C","C","B","A"))

will result in

[1] "B" "A" "C"

I believe unique(x) will in general be identical to (but more efficient than)

x[!duplicated(x)]

If you want to look at the internal code, see here: the moving parts are something like

k = 0;
switch (TYPEOF(x)) {
case LGLSXP:
case INTSXP:
for (i = 0; i < n; i++)
    if (LOGICAL(dup)[i] == 0)
    INTEGER(ans)[k++] = INTEGER(x)[i];
break;

i.e., the internal representation is exactly what I said, that it goes through the vector sequentially and fills in non-duplicated elements. Since ordering isn't explicitly guaranteed in the documentation it is theoretically possible that this implementation could change in the future, but it is almost vanishingly unlikely.

For what you're trying to do there are simpler R idioms

df <- data.frame(group=rep(LETTERS, each=2), value=1:52)
a1 <- aggregate(df$value,list(df$group),mean)

This returns a two-column data frame, so you can use

setNames(a1[,2],a1[,1])

to convert it to your format. Or

library(plyr)
unlist(daply(df,"group",summarise,val=mean(value)))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 3
    Well, technically, a sorted list of unique elements _is_ "like" x, but with duplicate elements removed. ;) – joran Apr 04 '16 at 21:59
  • 2
    I guess I interpret the documentation same as @joran - that R doesn't explicitly promise to return them FIFO. Group consensus is good enough for me! – fanli Apr 04 '16 at 22:39
3

R will return a sorted vector if unique is called on a RasterLayer object.

example <- raster(xmn = 0, xmx = 100, ymn = 0, ymx = 100, nrow = 100, ncol = 100)
example[] <- sample(x <- 1:100, 10000, replace = TRUE)

plot(example)

vals <- values(example)[x]
identical(vals, x)

uniques <- unique(example)
identical(uniques, x)

The values should (very likely) not be identical to the ordered vector, but unique values will always be identical to the ordered vector.

Otherwise, the previous answers are correct that R will return a vector of the order that the non-duplicates appeared.

Iris
  • 377
  • 2
  • 7