1

Say, I have a vector as shown below:

v1<- c("p 1", "p 2", "p 10", "p 11")

Sorting it using sort(v1) gives me

[1] "p 1"  "p 10" "p 11" "p 2" 

I would however like to have sort(v1)

[1] "p 1"  "p 2" "p 10" "p 11"

Based on the help file, sort doesn't seem to allow lexical ordering. I wonder if lexical ordering is possible at all without installing any additional package.

Alex
  • 4,030
  • 8
  • 40
  • 62
  • From `?sort`: "The sort order for character vectors will depend on the collating sequence of the locale in use: see 'Comparison'." – Joshua Ulrich Mar 21 '13 at 15:40
  • 2
    If your case is like this: `some characters followed by a space and then only numbers`, then you can do something like this to get around: `v1[order(as.numeric(gsub(".* ", "", v1)))]` – Arun Mar 21 '13 at 15:42
  • @Arun I thought about doing something similar (but less elegant) `v1[order(as.numeric(substr(v1, 3, nchar(v1))))]`, but I would like the method to be more generalizable, as it is part of a function I am writing. If it is not possible, I guess I will require that all the input be numeric, I suppose. – Alex Mar 21 '13 at 15:54
  • Then, Joshua's answer pretty much sums it up. – Arun Mar 21 '13 at 15:57
  • 2
    Since you're writing a function of your own, perhaps you can look at the code for `mixedorder` from the "gtools" package and see what part of that code is most relevant for your needs. – A5C1D2H2I1M1N2O1R2T1 Mar 21 '13 at 16:02
  • Thanks! I definitely take a look at `mixedorder` – Alex Mar 21 '13 at 19:43

2 Answers2

4

Here's one way. Make a vector where the numerals are padded with zeros, then sort by this vector.

v1.padded <- mapply(gsub, list('\\d+'), sprintf('%.4d', as.numeric(regmatches(v1, gregexpr('\\d+', v1)))), v1)
# "p 0001" "p 0002" "p 0010" "p 0011"
v1[order(v1.padded)]
# "p 1"  "p 2"  "p 10" "p 11"

Here's a second way to do it that would generalize to situations where the strings have more than one numeral.

v1<- c("p 1 1", "p 11 1", "p 1 2", "p 2 3", "p 10 4")
parallel.split <- lapply(data.frame(do.call(rbind, strsplit(v1, ' ')), stringsAsFactors=FALSE), type.convert, as.is=TRUE)
inter <- do.call(interaction, c(parallel.split, list(lex.order=TRUE)))
v1[order(inter)]
# [1] "p 1 1"  "p 1 2"  "p 2 3"  "p 10 4" "p 11 1"
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
4

You could look at the code for mixedsort and type it into R yourself. Then you would have the function without installing an additional package.

Or you can use the order function after splitting the character strings into their pieces:

1 <- c('p 1', 'q 2','p 2','p 11', 'p 10')
sort(v1)

tmp <- strsplit(v1, ' +')
tmp1 <- sapply(tmp, '[[', 1)
tmp2 <- as.numeric(sapply(tmp, '[[', 2))
v1[ order( tmp1, tmp2 ) ]

Or you can automate this by writing a method for xtfrm and giving your vector the appropriate class:

xtfrm.mixed <- function(x) {
    tmp <- strsplit(x, ' +')
    tmp1 <- sapply(tmp, '[[', 1)
    tmp2 <- as.numeric(sapply(tmp, '[[', 2))
    tmp3 <- rank(tmp1, ties.method='min')
    tmp4 <- rank(tmp2, ties.method='min')
    tmp3+tmp4/(max(tmp4)+1)
}

class(v1) <- 'mixed'
sort(v1)

If all of your data starts with "p " then you could just strip that off and coerce to numeric and use in order.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110