Python Sets vs Lists

195,102

Solution 1

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but are slower than lists when it comes to iterating over their contents.

You can use the timeit module to see which is faster for your situation.

Solution 2

Lists are slightly faster than sets when you just want to iterate over the values.

Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

It turns out tuples perform in almost exactly the same way as lists, except for their immutability.

Iterating

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

Determine if an object is present

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

Solution 3

Set wins due to near instant 'contains' checks: https://en.wikipedia.org/wiki/Hash_table

List implementation: usually an array, low level close to the metal good for iteration and random access by element index.

Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list could be faster if you have very few elements (< 5), the larger element count the better the set will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !

NOTE: If the list is already sorted, searching the list could be quite fast on small lists, but with more data a set is faster for contains checks.

Solution 4

List performance:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = range(10**6)', number=100000)
0.008128150348026608

Set performance:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=100000)
0.005674857488571661

You may want to consider Tuples as they're similar to lists but can’t be modified. They take up slightly less memory and are faster to access. They aren’t as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys.

Sets are also sequence structures but with two differences from lists and tuples. Although sets do have an order, that order is arbitrary and not under the programmer’s control. The second difference is that the elements in a set must be unique.

set by definition. [python | wiki].

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

Solution 5

tl;dr

Data structures (DS) are important because they are used to perform operations on data which basically implies: take some input, process it, and give back the output.

Some data structures are more useful than others in some particular cases. Therefore, it is quite unfair to ask which (DS) is more efficient/speedy. It is like asking which tool is more efficient between a knife and fork. I mean all depends on the situation.

Lists

A list is mutable sequence, typically used to store collections of homogeneous items.

Sets

A set object is an unordered collection of distinct hashable objects. It is commonly used to test membership, remove duplicates from a sequence, and compute mathematical operations such as intersection, union, difference, and symmetric difference.

Usage

From some of the answers, it is clear that a list is quite faster than a set when iterating over the values. On the other hand, a set is faster than a list when checking if an item is contained within it. Therefore, the only thing you can say is that a list is better than a set for some particular operations and vice-versa.

Share:
195,102

Related videos on Youtube

Mantas Vidutis
Author by

Mantas Vidutis

always on the move. operating a consultancy in SF: turbines.io

Updated on December 27, 2021

Comments

  • Mantas Vidutis
    Mantas Vidutis over 2 years

    In Python, which data structure is more efficient/speedy? Assuming that order is not important to me and I would be checking for duplicates anyway, is a Python set slower than a Python list?

  • Seaux
    Seaux over 10 years
    First off, you should update to the set built-in type link (docs.python.org/2/library/stdtypes.html#set) not the deprecated sets library. Second, "Sets are also sequence structures", read the following from the built-in type link: "Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior."
  • ThePracticalOne
    ThePracticalOne over 9 years
    I have found that (Initializing set -> 5.5300979614257812) (Initializing list -> 1.8846848011016846) (Initializing tuple -> 1.8730108737945557) Items of size 10,000 on my intel core i5 quad core with 12GB RAM. This should be take into consideration also.
  • Ellis Percival
    Ellis Percival over 9 years
    I've updated the code to remove the object creation now. The setup phase of the timeit loops is only called once (docs.python.org/2/library/timeit.html#timeit.Timer.timeit).
  • overexchange
    overexchange almost 9 years
    For your point: "Sets are significantly faster ", what is the underlying implementation that makes it faster?
  • omerfarukdogan
    omerfarukdogan over 6 years
    Set is not significantly slower than list while iterating.
  • habnabit
    habnabit about 6 years
    Sets and lists both have linear time iteration. To say that one is "slower" than the other is misguided and has confused new programmers who read this answer.
  • Ryne Wang
    Ryne Wang about 6 years
    range is not list. range is a special class with custom __contains__ magic method.
  • Manoel Vilela
    Manoel Vilela over 5 years
    @RyneWang this is true, but only for Python3. In Python2 range returns a normal list (that's why exists horrible things like xrange)
  • Mohammed Noureldin
    Mohammed Noureldin about 4 years
    @habnabit if you are saying that they both have linear time iteration. Does this mean they have the same iteration time? What is the difference then?
  • Mandera
    Mandera over 3 years
    They both have a running time complexity of O(n) when iterated, but the average-case complexity of iterating sets is ~28% greater (slower) than iterating lists
  • Hemerson Tacon
    Hemerson Tacon over 2 years
    To answer 'For your point: "Sets are significantly faster ", what is the underlying implementation that makes it faster?' Sets use hash functions to determine if an element is in it (if the hash function is good, i.e. collisions are not common, O(1) complexity), while to determine if an element is in a list, an iteration trough the list is necessary (O(n) complexity). Check this out for the time complexity for the methods on Python default data structures.