How do I match all occurrences in R?

13,517

Solution 1

 A <- sample(1:10, 100, 100) ## generate sample data 
 B <- 1:5

 A %in% B
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
[13] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
[25] FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
[37] FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
[49]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
[61]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[73]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[85]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
[97] FALSE FALSE FALSE  TRUE

which(A %in% B)
 [1]   2   3   4   5   7  11  12  14  16  17  20  21  22  23  24  27  28  30  31
[20]  36  38  39  40  41  43  44  46  49  51  52  55  56  61  62  67  69  71  73
[39]  74  79  85  86  87  88  91  93  94  95 100


lapply(B, function(x) which(A %in% x)) 
[[1]]
 [1]  5 22 23 36 40 49 69

[[2]]
[1] 21 30 39 44 46 56 61 85 93

[[3]]
[1]  2  7 14 28 38 51 62 73 87 91

[[4]]
 [1]  3  4 11 12 20 24 27 41 43 52 55 71 74 79 88

[[5]]
[1]  16  17  31  67  86  94  95 100

without lapply, you dont know which element of B is where in A from this do you?

Solution 2

The package grr contains a function matches which will allow you to find all matches of all elements from A in B. It can return the result as a two column lookup table or in the list format returned by lapply(B, function(x) which(A %in% x)). However, it is orders of magnitude faster:

> A <- as.character(sample(1:1000, 1e5,TRUE)) ## generate sample data
> B <- as.character(1:500)
> microbenchmark::microbenchmark(result<-lapply(B, function(x) which(A %in% x)),result2<-grr::matches(B,A,list=TRUE,all.y=FALSE),times=10)
Unit: milliseconds
                                                      expr        min         lq       mean     median         uq        max neval
          result <- lapply(B, function(x) which(A %in% x)) 1193.50104 1218.60509 1276.58727 1237.82048 1253.76487 1497.18798    10
 result2 <- grr::matches(B, A, list = TRUE, all.y = FALSE)   54.83836   56.28509   57.39188   57.79095   58.17673   59.46505    10
Share:
13,517
Geekuna Matata
Author by

Geekuna Matata

Updated on June 16, 2022

Comments

  • Geekuna Matata
    Geekuna Matata almost 2 years

    I have a list of 1000 names. (say A) I have another list of 5 names. (say B) I want to find out at which row number the 5 names occur in the 1000 number list.

    eg. Amy can occur 25 times in A And there is Amy in B. I want to know which rows Amy occurs in A.

    I have used match before, but that only gives the first occurrences of each of the 5 names in B which doesn't serve my purpose.

    I need to do it in R. I don't know anything about Regex.

    Thanks!

  • cianius
    cianius about 10 years
    because it lets you see which item of B matched where in A.
  • cianius
    cianius about 10 years
    are you sure about that?
  • GSee
    GSee about 10 years
    I see. I didn't realize the OP said "list". I assumed it was character vectors