3

So, what I want is a dataframe where the combinations of two random vectors are represented on the row. I do not want duplicate combinations like; 1,2;2,1. Just 1 of them. As well as the combination NOT being self-repeating; 1,1.

Right now I got this simple for loop, but it is not ideal;

unique_combos <- function(v1, v2) {
    df <- data.frame(matrix(ncol=2))
    counter = 0
    for (name1 in v1) {
        for (name2 in v2) {
            if (name1 != name2){
                counter = counter + 1
                df[counter,] <- c(name1, name2)
            }
        }
    }
    return(df)
}

# example usage;
> v1 <- c(1,2,3,4)
> v2 <- c(3,4,5,6)
> unique_combos(v1, v2)
   X1 X2
1   1  3
2   1  4
3   1  5
4   1  6
5   2  3
6   2  4
7   2  5
8   2  6
9   3  4
10  3  5
11  3  6
12  4  3
13  4  5
14  4  6
> 

Any vectorized way to do this? Preferably aimed towards performance as well. Besides this I wanted to note that the vectors can be any length and will contain random variables.

Edit1 - my function does not work properly!; I don't want the 3-4 4-3 combination.

Edit2 - My final solution by both @Ryan and @Frank (thanks guys!);

unique_combos <- function(v1, v2) {
  intermediate <- unique(CJ(v1, v2)[V1 > V2, c("V1", "V2") := .(V2, V1)])
  return(intermediate[V1 != V2])

*note; this does use the packages data.table and plyr.

Berghopper
  • 57
  • 4

3 Answers3

2

There is no need for loops at all.
You can use expand.grid and have the data.frame, with repeats, in one instruction. Then with a logical index keep only the different rows.

unique_combos2 <- function(v1, v2) {
  e <- expand.grid(v1, v2)
  e <- e[e[[1]] < e[[2]], ]
  e[order(e[[1]]), ]
}


u1 <- unique_combos(v1, v2)
u2 <- unique_combos2(v1, v2)

Now the speed tests. First with your data, then with larger vectors. I will load packages microbenchmark and ggplot2 to run the tests and visualize the results.

(Results not shown.)

library(microbenchmark)
library(ggplot2)

mb1 <- microbenchmark(
  u1 = unique_combos(v1, v2),
  u2 = unique_combos2(v1, v2)
)

mb1
autoplot(mb1)

w1 <- 1:20
w2 <- sample(100, 30)

mb2 <- microbenchmark(
  u1 = unique_combos(w1, w2),
  u2 = unique_combos2(w1, w2)
)

mb2
autoplot(mb2)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I edited my post, the combinations aren't right! See the 3-4 4-3 combination. I want to avoid this kind of behavior and just have 1 instead of both. – Berghopper Oct 05 '18 at 19:43
  • @CasperPetersBerghopper It's now with Frank's idea of using `<`. I think this corrects it. – Rui Barradas Oct 05 '18 at 21:19
2

The speed difference here probably won't have any real impact unless your vectors are huge, but since you put "performance" as a tag, here's a slightly faster method.

library(data.table)
CJ(v1, v2)[V1 != V2]

Benchmark:

Note: CJ will order by v1 by default, and ordering by v1 in unique_combos2 takes a lot of time, so I removed that part since it's not clear you need it.

unique_combos2 <- function(v1, v2) {
  e <- expand.grid(v1, v2)
  e <- e[e[[1]] != e[[2]], ]
  e
}
unique_combos3 <- function(v1, v2) CJ(v1, v2)[V1 != V2]

w1 <- sample(200)
w2 <- sample(200)
mb2 <- microbenchmark(
  u2 = unique_combos2(w1, w2),
  u3 = unique_combos3(w1, w2)
)

# Unit: milliseconds
#  expr      min       lq      mean   median       uq        max neval cld
#    u2 5.513842 5.942765 10.969386 6.692507 8.158763 368.180211   100   b
#    u3 1.140513 1.443076  1.898202 1.711384 2.139075   8.397942   100  a 

Edit: To remove duplicate pairs irrespective of order, use @Frank's solution in the comments, which effectively sorts all the rows before calling unique

unique(CJ(v1, v2)[V1 > V2, c("V1", "V2") := .(V2, V1)])
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • I edited my post, the combinations aren't right! See the 3-4 4-3 combination. I want to avoid this kind of behavior and just have 1 instead of both. And no, I do not need an exact ordering of v1, I just need the combinations – Berghopper Oct 05 '18 at 19:44
  • 1
    @Caspar I guess using `>` or `<` in place of Ryan's `!=` (and ditto in the other answers) should fix that. – Frank Oct 05 '18 at 19:49
  • @Frank Thanks! Everything works now like I want it to :). I don't get why this works though, can you explain why using `>` or `<` fixes this? – Berghopper Oct 05 '18 at 20:05
  • 1
    The solution of using `>` or `<` is not quite right, because you could easily remove non-duplicate pairs as well that way, depending on the vectors. Just use the method in this question https://stackoverflow.com/questions/9028369/removing-duplicate-combinations-irrespective-of-order – IceCreamToucan Oct 05 '18 at 20:14
  • 1
    Oh, good point Ryan. Another option would be `unique(CJ(v1, v2)[V1 > V2, c("V1", "V2") := .(V2, V1)])` @Caspar It works (at least in this latter formulation) by choosing a convention so either pairs V1 < V2 or V1 > V2 are kept. – Frank Oct 05 '18 at 20:16
1

Here's a tidyverse way, mostly using purrr tools. (edited to address clarification of the question). This method does the following:

  1. Get a list of the product set of the vectors, filtering cases where they are equal,
  2. Convert the list elements to sorted integer vectors, and discard any that are duplicates with unique,
  3. transpose back to a list-of-columns structure, simplify to convert columns to vectors, and place back inside a data frame.

Very open to seeing if anyone can come up with a way to condense some steps!

v1 <- c(1,2,3,4)
v2 <- c(3,4,5,6)
library(tidyverse)
cross2(v1, v2, .filter = `==`) %>%
  map(~ sort(as.integer(.))) %>%
  unique %>%
  transpose(.names = c("x", "y")) %>%
  simplify_all %>%
  as_tibble()
#> # A tibble: 13 x 2
#>        x     y
#>    <int> <int>
#>  1     1     3
#>  2     2     3
#>  3     3     4
#>  4     1     4
#>  5     2     4
#>  6     1     5
#>  7     2     5
#>  8     3     5
#>  9     4     5
#> 10     1     6
#> 11     2     6
#> 12     3     6
#> 13     4     6

Created on 2018-10-05 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • Hi, I edited my post, thank you for pointing out the combinations still being present (my mistake). I do not necessarily need an ordering, I just want the unique combinations. To clarify; yes, I just want one of the combinations, does not matter which one. – Berghopper Oct 05 '18 at 19:46
  • Edited to address this clarification! – Calum You Oct 05 '18 at 20:34