How does one do a full join using data.table?

11,655

Solution 1

You actually have it right there. Use merge.data.table which is exactly what you are doing when you call

merge(a, b, by = "dog", all = TRUE)

since a is a data.table, merge(a, b, ...) calls merge.data.table(a, b, ...)

Solution 2

x= data.table(a=1:5,b=11:15)
y= data.table(a=c(1:4,6),c=c(101:104,106))

setkey(x,a)
setkey(y,a)

unique_keys <- unique(c(x[,a], y[,a]))
y[x[.(unique_keys), on="a"]  ] # Full Outer Join
Share:
11,655
Paul Murray
Author by

Paul Murray

Updated on June 17, 2022

Comments

  • Paul Murray
    Paul Murray almost 2 years

    In the data.table FAQ, the nomatch = NA parameter is said to be akin to an outer join. However, I haven't been able to get data.table to do a full outer join – only right outer joins.

    For example:

    a <- data.table("dog" = c(8:12), "cat" = c(15:19))
    
       dog cat
    1:   8  15
    2:   9  16
    3:  10  17
    4:  11  18
    5:  12  19
    
    b <- data.table("dog" = 1:10, "bullfrog" = 11:20)
    
        dog bullfrog
     1:   1       11
     2:   2       12
     3:   3       13
     4:   4       14
     5:   5       15
     6:   6       16
     7:   7       17
     8:   8       18
     9:   9       19
    10:  10       20
    
    setkey(a, dog)
    setkey(b, dog)
    
    a[b, nomatch = NA]
    
        dog cat bullfrog
     1:   1  NA       11
     2:   2  NA       12
     3:   3  NA       13
     4:   4  NA       14
     5:   5  NA       15
     6:   6  NA       16
     7:   7  NA       17
     8:   8  15       18
     9:   9  16       19
    10:  10  17       20
    

    So, nomatch = NA produces a right outer join (which is the default). What if I need a full join? For example:

    merge(a, b, by = "dog", all = TRUE) 
    # Or with plyr:
    join(a, b, by = "dog", type = "full")
    
        dog cat bullfrog
     1:   1  NA       11
     2:   2  NA       12
     3:   3  NA       13
     4:   4  NA       14
     5:   5  NA       15
     6:   6  NA       16
     7:   7  NA       17
     8:   8  15       18
     9:   9  16       19
    10:  10  17       20
    11:  11  18       NA
    12:  12  19       NA
    

    Is that possible with data.table?

  • Ameya
    Ameya almost 6 years
    Is there any way we can achieve the same result with the A[B] syntax?
  • Feng Jiang
    Feng Jiang over 4 years
    Brilliant! To make it easier to understand, I'd like to break the last two lines to three: 1) unique_keys <- data.table(a = unique(c(x[,a], y[,a]))) 2) xx <- x[unique_keys] 3) y[xx]