Difference between the == and %in% operators in R

r operators filtering

89,116

Solution 1

%in% is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).

1:2 %in% rep(1:2,5)
#[1] TRUE TRUE

rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

#Note this output is longer in second

== is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.

1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

1:10 %in% 3:7
#[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

#is same as 

sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

NOTE: If possible, try to use identical or all.equal instead of == and.

Solution 2

Given two vectors, x and y, the code x == y will compare the first element of x with the first element of y, then the second element of x with the second element of y, and so on. When using x == y, the lengths of x and y must be the same. Here, compare means "is equal to" and therefore the output is a logical vector equal to the length of x (or y).

In the code x %in% y, the first element of x is compared to all elements in y, then the second element of x is compared to all elements of y, and so on. Here, compare means "is the current element of x equal to any value in y" and therefore the output is a logical vector that has the same length of x and not (necessarily) y.

Here is a code snippet illustrating the difference. Note that x and y have the same lengths but the elements of y are the elements of x in different order. Note too in the final examples that x is a 3-element vector being compared to the letters vector, which contains 26 elements.

> x <- c('a','b','c')
> y <- c('c', 'b', 'a')
> x == y
[1] FALSE  TRUE FALSE

> x %in% y
[1] TRUE TRUE TRUE

> x %in% letters
[1] TRUE TRUE TRUE

> letters %in% x
 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
 [7] FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE
[19] FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE

Solution 3

Try it for objects of different length.

ac <- c("a", "b", "c")
ae <- c("a", "b", "c", "d", "e")
ac %in% ae
[1] TRUE TRUE TRUE
ac == ae
[1]  TRUE  TRUE  TRUE FALSE FALSE

It's becomes clear that %in% checks whether or not the object is contained in the other object. Whereas == is a logical operator that checks for identity properties.

89,116

R_user1233

Updated on October 04, 2020

Comments

R_user1233 over 3 years

My question concerns the practical difference between the == and %in% operators in R.

I have run into an instance at work where filtering with either operator gives different results (e.g. one results on 800 rows, and the other 1200). I have run into this problem in the past and am able to validate in a way that ensures I get the results I desire. However, I am still stumped regarding how they are different.

Can someone please shed some light on how these operators are different?
- Rich Scriven about 7 years
  
  Were the documentation files not helpful?
Dason about 7 years

I don't think your description of what == does is quite accurate