2

I've looked around and can't seem to find a decent way to solve this issue.

I have a column that has rows of names. I'd like to sort each row alphabetically so that I can later identify rows that have the same names just in different orders.

The data looks like this:

names <- c("John D., Josh C., Karl H.",
        "John D., Bob S., Tim H.",
        "Amy A., Art U., Wes T.",
        "Josh C., John D., Karl H.")

var1 <- rnorm(n = length(names), mean = 0, sd = 2)
var2 <- rnorm(n = length(names), mean = 20, sd = 5)

df <- data.frame(names, var1, var2)
df

                      names       var1     var2
1 John D., Josh C., Karl H. -0.3570142 15.58512
2   John D., Bob S., Tim H. -3.0022367 12.32608
3    Amy A., Art U., Wes T. -0.6900956 18.01553
4 Josh C., John D., Karl H. -2.0162847 16.04281

For example, row 4 would get sorted to look like row 1. Row 2 would get sorted as Bob, John, and Tim.

I've tried sort(df$names) but that just orders the names in all rows into alphabetical order.

user3585829
  • 945
  • 11
  • 24

3 Answers3

5

With dplyr, you can try:

df %>%
 rowwise() %>%
 mutate(names = paste(sort(unlist(strsplit(names, ", ", fixed = TRUE))), collapse = ", "))

  names                       var1  var2
  <chr>                      <dbl> <dbl>
1 John D., Josh C., Karl H. -0.226  19.9
2 Bob S., John D., Tim H.    0.424  24.8
3 Amy A., Art U., Wes T.     1.42   25.0
4 John D., Josh C., Karl H.  5.42   20.4

Sample data:

df <- data.frame(names, var1, var2,
                 stringsAsFactors = FALSE)
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

In base R you could do this:

# Converting factor to character
df$names <- as.character(df$names)

# Splitting string on comma+space(s), sorting them in list, 
# and pasting them back together with a comma and a space
df$names <- sapply(lapply(strsplit(df$names, split = ",\\s*"), sort), paste, collapse = ", ")

df
                      names      var1     var2
1 John D., Josh C., Karl H. -2.285181 15.82278
2   Bob S., John D., Tim H.  2.797259 21.42946
3    Amy A., Art U., Wes T.  1.001353 17.30004
4 John D., Josh C., Karl H.  4.034996 24.86374
Andrew
  • 5,028
  • 2
  • 11
  • 21
2

Define a function Sort which scans in names splitting them into individual fields, sorts them and puts them back together. Then sapply it to the names. No packages are used.

Sort <- function(x) {
  s <- scan(text = as.character(x), what = "", sep = ",", 
    strip.white = TRUE, quiet = TRUE)
  toString(sort(s))
}
transform(df, names = sapply(names, Sort))

giving:

                      names      var1     var2
1 John D., Josh C., Karl H. -0.324619 28.02955
2   Bob S., John D., Tim H.  1.126112 14.21096
3    Amy A., Art U., Wes T.  3.295635 23.28294
4 John D., Josh C., Karl H. -1.546707 32.74496
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341