Fast way of getting index of match in list
Here's one possibility using match
:
> a <- list(1:3, 4:5, 6:9)
> b <- c(2, 3, 5, 8)
> g <- rep(seq_along(a), sapply(a, length))
> g[match(b, unlist(a))]
[1] 1 1 2 3
findInterval
is another option:
> findInterval(match(b, unlist(a)), cumsum(c(0,sapply(a, length)))+1)
[1] 1 1 2 3
For returning a list, try this:
a <- list(1:3, 4:5, 5:9)
b <- c(2,3,5,8,5)
g <- rep(seq_along(a), sapply(a, length))
aa <- unlist(a)
au <- unique(aa)
af <- factor(aa, levels=au)
gg <- split(g, af)
gg[match(b, au)]
ThomasP85
Updated on June 13, 2020Comments
-
ThomasP85 almost 4 years
Given a list
a
containing vectors of unequal length and a vectorb
containing some elements from the vectors ina
, I want to get a vector of equal length tob
containing the index ina
where the element inb
matches (this is a bad explanation I know)...The following code does the job:
a <- list(1:3, 4:5, 6:9) b <- c(2, 3, 5, 8) sapply(b, function(x, list) which(unlist(lapply(list, function(y, z) z %in% y, z=x))), list=a) [1] 1 1 2 3
Replacing the
sapply
with a for loop achieves the same of courseThe problem is that this code will be used with list and vectors with a length above 1000. On a real life set the function takes around 15 seconds (both the for loop and the
sapply
).Does anyone have an idea how to speed this up, safe for a parallel approach? I have failed to see a vectorized approach (and I cannot program in C, though that would probably be the fastest).
Edit:
Will just emphasize Aaron's elegant solution using match() which gave a speed increase in the order of 1667 times (from 15 to 0.009)
I expanded a bit on it to allow multiple matches (the return is then a list)
a <- list(1:3, 3:5, 3:7) b <- c(3, 5) g <- rep(seq_along(a), sapply(a, length)) sapply(b, function(x) g[which(unlist(a) %in% x)]) [[1]] [1] 1 2 3 [[2]] [1] 2 3
The runtime for this was 0.169 which is arguably quite slower, but on the other hand more flexible
-
Joshua Ulrich almost 12 yearsWhat do you want the algorithm to do if an element of
b
appears in more than one element ofa
? Is that possible in your actual problem? -
ThomasP85 almost 12 yearsI should have specified that... It is not a possibility
-
-
ThomasP85 almost 12 yearsFrom 15 sec to 0.009 - thats an impressive improvement. I found out that I would actually like to return a list instead of a vector, so that it can handle multiple matches. I substituted the last line in your first suggestion with sapply(b, function(x) g[which(unlist(a) %in% x)]) to achieve this. The run time was then 0.169, which is quite slower than your but still a major improvement.