How do I match all occurrences in R?
Solution 1
A <- sample(1:10, 100, 100) ## generate sample data
B <- 1:5
A %in% B
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
[13] FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[25] FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
[37] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
[49] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
[61] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[73] TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[85] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[97] FALSE FALSE FALSE TRUE
which(A %in% B)
[1] 2 3 4 5 7 11 12 14 16 17 20 21 22 23 24 27 28 30 31
[20] 36 38 39 40 41 43 44 46 49 51 52 55 56 61 62 67 69 71 73
[39] 74 79 85 86 87 88 91 93 94 95 100
lapply(B, function(x) which(A %in% x))
[[1]]
[1] 5 22 23 36 40 49 69
[[2]]
[1] 21 30 39 44 46 56 61 85 93
[[3]]
[1] 2 7 14 28 38 51 62 73 87 91
[[4]]
[1] 3 4 11 12 20 24 27 41 43 52 55 71 74 79 88
[[5]]
[1] 16 17 31 67 86 94 95 100
without lapply, you dont know which element of B is where in A from this do you?
Solution 2
The package grr
contains a function matches
which will allow you to find all matches of all elements from A in B. It can return the result as a two column lookup table or in the list format returned by lapply(B, function(x) which(A %in% x))
. However, it is orders of magnitude faster:
> A <- as.character(sample(1:1000, 1e5,TRUE)) ## generate sample data
> B <- as.character(1:500)
> microbenchmark::microbenchmark(result<-lapply(B, function(x) which(A %in% x)),result2<-grr::matches(B,A,list=TRUE,all.y=FALSE),times=10)
Unit: milliseconds
expr min lq mean median uq max neval
result <- lapply(B, function(x) which(A %in% x)) 1193.50104 1218.60509 1276.58727 1237.82048 1253.76487 1497.18798 10
result2 <- grr::matches(B, A, list = TRUE, all.y = FALSE) 54.83836 56.28509 57.39188 57.79095 58.17673 59.46505 10
Geekuna Matata
Updated on June 16, 2022Comments
-
Geekuna Matata almost 2 years
I have a list of 1000 names. (say A) I have another list of 5 names. (say B) I want to find out at which row number the 5 names occur in the 1000 number list.
eg. Amy can occur 25 times in A And there is Amy in B. I want to know which rows Amy occurs in A.
I have used match before, but that only gives the first occurrences of each of the 5 names in B which doesn't serve my purpose.
I need to do it in R. I don't know anything about Regex.
Thanks!
-
cianius about 10 yearsbecause it lets you see which item of B matched where in A.
-
cianius about 10 yearsare you sure about that?
-
GSee about 10 yearsI see. I didn't realize the OP said "list". I assumed it was character vectors