Non-redundant version of expand.grid

15,934

Solution 1

How about using outer? But this particular function concatenates them into one character string.

outer( c("aa", "ab", "cc"), c("aa", "ab", "cc") , "paste" )
#     [,1]    [,2]    [,3]   
#[1,] "aa aa" "aa ab" "aa cc"
#[2,] "ab aa" "ab ab" "ab cc"
#[3,] "cc aa" "cc ab" "cc cc"

You can also use combn on the unique elements of the two vectors if you don't want the repeating elements (e.g. aa aa)

vals <- c( c("aa", "ab", "cc"), c("aa", "ab", "cc") )
vals <- unique( vals )
combn( vals , 2 )
#     [,1] [,2] [,3]
#[1,] "aa" "aa" "ab"
#[2,] "ab" "cc" "cc"

Solution 2

In base R, you can use this:

expand.grid.unique <- function(x, y, include.equals=FALSE)
{
    x <- unique(x)

    y <- unique(y)

    g <- function(i)
    {
        z <- setdiff(y, x[seq_len(i-include.equals)])

        if(length(z)) cbind(x[i], z, deparse.level=0)
    }

    do.call(rbind, lapply(seq_along(x), g))
}

Results:

> x <- c("aa", "ab", "cc")
> y <- c("aa", "ab", "cc")

> expand.grid.unique(x, y)
     [,1] [,2]
[1,] "aa" "ab"
[2,] "aa" "cc"
[3,] "ab" "cc"

> expand.grid.unique(x, y, include.equals=TRUE)
     [,1] [,2]
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"

Solution 3

If the two vectors are the same, there's the combinations function in the gtools package:

library(gtools)
combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = TRUE)

#      [,1] [,2]
# [1,] "aa" "aa"
# [2,] "aa" "ab"
# [3,] "aa" "cc"
# [4,] "ab" "ab"
# [5,] "ab" "cc"
# [6,] "cc" "cc"

And without "aa" "aa", etc.

combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = FALSE)

Solution 4

The previous answers were lacking a way to get a specific result, namely to keep the self-pairs but remove the ones with different orders. The gtools package has two functions for these purposes, combinations and permutations. According to this website:

  • When the order doesn't matter, it is a Combination.
  • When the order does matter it is a Permutation.

In both cases, we have the decision to make of whether repetitions are allowed or not, and correspondingly, both functions have a repeats.allowed argument, yielding 4 combinations (deliciously meta!). It's worth going over each of these. I simplified the vector to single letters for ease of understanding.

Permutations with repetition

The most expansive option is to allow both self-relations and differently ordered options:

> permutations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
      [,1] [,2]
 [1,] "a"  "a" 
 [2,] "a"  "b" 
 [3,] "a"  "c" 
 [4,] "b"  "a" 
 [5,] "b"  "b" 
 [6,] "b"  "c" 
 [7,] "c"  "a" 
 [8,] "c"  "b" 
 [9,] "c"  "c" 

which gives us 9 options. This value can be found from the simple formula n^r i.e. 3^2=9. This is the Cartesian product/join for users familiar with SQL.

There are two ways to limit this: 1) remove self-relations (disallow repetitions), or 2) remove differently ordered options (i.e. combinations).

Combinations with repetitions

If we want to remove differently ordered options, we use:

> combinations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "b" 
[5,] "b"  "c" 
[6,] "c"  "c" 

which gives us 6 options. The formula for this value is (r+n-1)!/(r!*(n-1)!) i.e. (2+3-1)!/(2!*(3-1)!)=4!/(2*2!)=24/4=6.

Permutations without repetition

If instead we want to disallow repetitions, we use:

> permutations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "a" 
[4,] "b"  "c" 
[5,] "c"  "a" 
[6,] "c"  "b" 

which also gives us 6 options, but different ones! The number of options is the same as above but it's a coincidence. The value can be found from the formula n!/(n-r)! i.e. (3*2*1)/(3-2)!=6/1!=6.

Combinations without repetitions

The most limiting is when we want neither self-relations/repetitions or differently ordered options, in which case we use:

> combinations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "c" 

which gives us only 3 options. The number of options can be calculated from the rather complex formula n!/(r!(n-r)!) i.e. 3*2*1/(2*1*(3-2)!)=6/(2*1!)=6/2=3.

Solution 5

Try:

factors <- c("a", "b", "c")

all.combos <- t(combn(factors,2))

     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "c"

This will not include duplicates of each factor (e.g. "a" "a"), but you can add those on easily if needed.

dup.combos <- cbind(factors,factors)

     factors factors
[1,] "a"     "a"    
[2,] "b"     "b"    
[3,] "c"     "c"   

all.combos <- rbind(all.combos,dup.combos)

     factors factors
[1,] "a"     "b"    
[2,] "a"     "c"    
[3,] "b"     "c"    
[4,] "a"     "a"    
[5,] "b"     "b"    
[6,] "c"     "c" 
Share:
15,934
Michele
Author by

Michele

Recently moved to an IT consultancy position after having spent 6 years studying condensed matter physics.

Updated on June 06, 2022

Comments

  • Michele
    Michele almost 2 years

    The R function expand.grid returns all possible combination between the elements of supplied parameters. e.g.

    > expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc"))
      Var1 Var2
    1   aa   aa
    2   ab   aa
    3   cc   aa
    4   aa   ab
    5   ab   ab
    6   cc   ab
    7   aa   cc
    8   ab   cc
    9   cc   cc
    

    Do you know an efficient way to get directly (so without any row comparison after expand.grid) only the 'unique' combinations between the supplied vectors? The output will be

      Var1 Var2
    1   aa   aa
    2   ab   aa
    3   cc   aa
    5   ab   ab
    6   cc   ab
    9   cc   cc
    

    EDIT the combination of each element with itself could be eventually discarded from the answer. I don't actually need it in my program even though (mathematically) aa aa would be one (regular) unique combination between one element of Var1 and another of var2.

    The solution needs to produce pairs of elements from both vectors (i.e. one from each of the input vectors - so that it could be applied to more than 2 inputs)

  • Michele
    Michele almost 11 years
    oh no, thanks. I need them separate to pass to other function. the previous was perfect! I actually realised I don't need aa aa, bb bb and cc cc. so if you re-edit back and I'll accept it :-)
  • CoderGuy123
    CoderGuy123 over 7 years
    I note that repeats.allowed also removes the self-pairs such as ("aa", "aa"), which are not actually repeated. There is a missing middle way with this function.
  • CoderGuy123
    CoderGuy123 over 7 years
    I note that it is missing a way to get the self-relationships included, but not the duplicate ones. For instance ("aa", "aa") is not a duplicate for it only appears only. In some cases, one wants the self-relationships too, but not the duplicates.
  • Joseph Wood
    Joseph Wood almost 3 years
    Don't want to offend, but I don't see how this answer is the most upvoted accepted answer and continuing to get upvotes. Both approaches are incorrect. The first proposal with outer is nearly identical to expand.grid, it is just in a different format. It is also limited to only 2 inputs. For the 2nd approach, combn can only act on a single vector. If you have vectors with different elements the approach suggested here will give many incorrect results. Continuing...
  • Joseph Wood
    Joseph Wood almost 3 years
    The most extreme example is when we have disjoint vectors E.g. v1 = 1:5, v2 = 6:10. The correct solution should be isomorphic to expand.grid(1:5, 6:10), which only has 25 results, however combn(1:10, 2) has 45 results. Again, I don't mean to offend, I just want to help to clarify for future users.
  • Joseph Wood
    Joseph Wood almost 3 years
    I have added an answer that addresses these concerns: stackoverflow.com/a/68050873/4408538