efficiently knowing if intersection of two list is empty or not, in python

24,764

Solution 1

Or more concisely

if set(L) & set(M):
    # there is an intersection
else:
    # no intersection

If you really need True or False

bool(set(L) & set(M))

After running some timings, this seems to be a good option to try too

m_set=set(M)
any(x in m_set  for x in L)

If the items in M or L are not hashable you have to use a less efficient approach like this

any(x in M for x in L)

Here are some timings for 100 item lists. Using sets is considerably faster when there is no intersection, and a bit slower when there is a considerable intersection.

M=range(100)
L=range(100,200)

timeit set(L) & set(M)
10000 loops, best of 3: 32.3 µs per loop

timeit any(x in M for x in L)
1000 loops, best of 3: 374 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
10000 loops, best of 3: 31 µs per loop

L=range(50,150)

timeit set(L) & set(M)
10000 loops, best of 3: 18 µs per loop

timeit any(x in M for x in L)
100000 loops, best of 3: 4.88 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
100000 loops, best of 3: 9.39 µs per loop


# Now for some random lists
import random
L=[random.randrange(200000) for x in xrange(1000)]
M=[random.randrange(200000) for x in xrange(1000)]

timeit set(L) & set(M)
1000 loops, best of 3: 420 µs per loop

timeit any(x in M for x in L)
10 loops, best of 3: 21.2 ms per loop

timeit m_set=set(M);any(x in m_set  for x in L)
1000 loops, best of 3: 168 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
1000 loops, best of 3: 371 µs per loop

Solution 2

To avoid the work of constructing the intersection, and produce an answer as soon as we know that they intersect:

m_set = frozenset(M)
return any(x in m_set for x in L)

Update: gnibbler tried this out and found it to run faster with set() in place of frozenset(). Whaddayaknow.

Solution 3

First of all, if you do not need them ordered, then switch to the set type.

If you still need the list type, then do it this way: 0 == False

len(set.intersection(set(L), set(M)))
Share:
24,764
Manuel Araoz
Author by

Manuel Araoz

I'm just trying to learn a bit of programming.

Updated on February 07, 2020

Comments

  • Manuel Araoz
    Manuel Araoz about 4 years

    Suppose I have two lists, L and M. Now I want to know if they share an element. Which would be the fastest way of asking (in python) if they share an element? I don't care which elements they share, or how many, just if they share or not.

    For example, in this case

    L = [1,2,3,4,5,6]
    M = [8,9,10]
    

    I should get False, and here:

    L = [1,2,3,4,5,6]
    M = [5,6,7]
    

    I should get True.

    I hope the question's clear. Thanks!

    Manuel

  • visual_learner
    visual_learner about 14 years
    @gnibbler - Is it provable that the any() version is less efficient? It seems like it would go through M only until it found an element in L, at which point any would return True and be done. This sounds more efficient than converting both L and M to sets beforehand. At least, on paper.
  • Manuel Araoz
    Manuel Araoz about 14 years
    This doesn't seem very efficient. I mean, the whole intersection is been calculated, isn't it!? Or is it lazily evaluated? Thanks!
  • jathanism
    jathanism about 14 years
    This here, this is the answer.
  • John La Rooy
    John La Rooy about 14 years
    @Chris, worst case is when when there is no intersection - O(l*m). With sets i believe it is O(l+m)
  • Manuel Araoz
    Manuel Araoz about 14 years
    WOW! so bool(set(L) & set(M)) is faster than any(x in M for x in L)... Who would think? :) Thank you.
  • John La Rooy
    John La Rooy about 14 years
    @Manuel, when i tested it, the intersection took less time to calculate than the time converting the lists to sets, so less than 1/3 of the total time
  • visual_learner
    visual_learner about 14 years
    @Manuel - The best, it seems, is to convert one list to a set to allow for faster membership testing (in), then to filter based on this membership test (x in m_set for x in L). @gnibbler, can we get some tests that utilize two randomly constructed lists just for completeness? (and also +1 for a fine job)
  • Darius Bacon
    Darius Bacon about 14 years
    Do you see any difference between frozenset and set in these tests? I just picked frozenset by default because this didn't happen to need mutability.
  • John La Rooy
    John La Rooy about 14 years
    @Darius, see the final test. set was considerably faster than frozenset.