Check if all elements in a list are identical

464,533

Solution 1

Use itertools.groupby (see the itertools recipes):

from itertools import groupby

def all_equal(iterable):
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

or without groupby:

def all_equal(iterator):
    iterator = iter(iterator)
    try:
        first = next(iterator)
    except StopIteration:
        return True
    return all(first == x for x in iterator)

There are a number of alternative one-liners you might consider:

  1. Converting the input to a set and checking that it only has one or zero (in case the input is empty) items

    def all_equal2(iterator):
        return len(set(iterator)) <= 1
    
  2. Comparing against the input list without the first item

    def all_equal3(lst):
        return lst[:-1] == lst[1:]
    
  3. Counting how many times the first item appears in the list

    def all_equal_ivo(lst):
        return not lst or lst.count(lst[0]) == len(lst)
    
  4. Comparing against a list of the first element repeated

    def all_equal_6502(lst):
        return not lst or [lst[0]]*len(lst) == lst
    

But they have some downsides, namely:

  1. all_equal and all_equal2 can use any iterators, but the others must take a sequence input, typically concrete containers like a list or tuple.
  2. all_equal and all_equal3 stop as soon as a difference is found (what is called "short circuit"), whereas all the alternatives require iterating over the entire list, even if you can tell that the answer is False just by looking at the first two elements.
  3. In all_equal2 the content must be hashable. A list of lists will raise a TypeError for example.
  4. all_equal2 (in the worst case) and all_equal_6502 create a copy of the list, meaning you need to use double the memory.

On Python 3.9, using perfplot, we get these timings (lower Runtime [s] is better):

for a list with a difference in the first two elements, groupby is fastestfor a list with no differences, count(l[0]) is fastest

Solution 2

A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that's trivial to check, and decide yourself what the outcome should be on an empty list)

x.count(x[0]) == len(x)

some simple benchmarks:

>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777

Solution 3

[edit: This answer addresses the currently top-voted itertools.groupby (which is a good answer) answer later on.]

Without rewriting the program, the most asymptotically performant and most readable way is as follows:

all(x==myList[0] for x in myList)

(Yes, this even works with the empty list! This is because this is one of the few cases where python has lazy semantics.)

This will fail at the earliest possible time, so it is asymptotically optimal (expected time is approximately O(#uniques) rather than O(N), but worst-case time still O(N)). This is assuming you have not seen the data before...

(If you care about performance but not that much about performance, you can just do the usual standard optimizations first, like hoisting the myList[0] constant out of the loop and adding clunky logic for the edge case, though this is something the python compiler might eventually learn how to do and thus one should not do it unless absolutely necessary, as it destroys readability for minimal gain.)

If you care slightly more about performance, this is twice as fast as above but a bit more verbose:

def allEqual(iterable):
    iterator = iter(iterable)
    
    try:
        firstItem = next(iterator)
    except StopIteration:
        return True
        
    for x in iterator:
        if x!=firstItem:
            return False
    return True

If you care even more about performance (but not enough to rewrite your program), use the currently top-voted itertools.groupby answer, which is twice as fast as allEqual because it is probably optimized C code. (According to the docs, it should (similar to this answer) not have any memory overhead because the lazy generator is never evaluated into a list... which one might be worried about, but the pseudocode shows that the grouped 'lists' are actually lazy generators.)

If you care even more about performance read on...


sidenotes regarding performance, because the other answers are talking about it for some unknown reason:

... if you have seen the data before and are likely using a collection data structure of some sort, and you really care about performance, you can get .isAllEqual() for free O(1) by augmenting your structure with a Counter that is updated with every insert/delete/etc. operation and just checking if it's of the form {something:someCount} i.e. len(counter.keys())==1; alternatively you can keep a Counter on the side in a separate variable. This is provably better than anything else up to constant factor. Perhaps you can also use python's FFI with ctypes with your chosen method, and perhaps with a heuristic (like if it's a sequence with getitem, then checking first element, last element, then elements in-order).

Of course, there's something to be said for readability.

Solution 4

Convert your input into a set:

len(set(the_list)) <= 1

Using set removes all duplicate elements. <= 1 is so that it correctly returns True when the input is empty.

This requires that all the elements in your input are hashable. You'll get a TypeError if you pass in a list of lists for example.

Solution 5

You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element.

if len(set(input_list)) == 1:
    # input_list has all identical elements.
Share:
464,533
max
Author by

max

Updated on July 31, 2022

Comments

  • max
    max almost 2 years

    I need a function which takes in a list and outputs True if all elements in the input list evaluate as equal to each other using the standard equality operator and False otherwise.

    I feel it would be best to iterate through the list comparing adjacent elements and then AND all the resulting Boolean values. But I'm not sure what's the most Pythonic way to do that.

  • aaronasterling
    aaronasterling over 13 years
    this is nice but it doesn't short circuit and you have to calculate the length of the resulting list.
  • codaddict
    codaddict over 13 years
    @AaronMcSmooth: Still a noob in py. Don't even know what a short circut in py means :)
  • aaronasterling
    aaronasterling over 13 years
    @codaddict. It means that even if the first two elements are distinct, it will still complete the entire search. it also uses O(k) extra space where k is the number of distinct elements in the list.
  • max
    max over 13 years
    Why the hell does this work faster than the naive manual iteration through all elements?? It has to build a set after all! But when I profiled this function, it worked 13 times faster than the naive implementation for i in range(1, len(input_list)): if input_list[i-1] != input_list[i]: return False #otherwise return True I set input_list = ['x'] * 100000000
  • aaronasterling
    aaronasterling over 13 years
    @max. because building the set happens in C and you have a bad implementation. You should at least do it in a generator expression. See KennyTM's answer for how to do it correctly without using a set.
  • Glenn Maynard
    Glenn Maynard over 13 years
    Don't forget memory usage analysis for very large arrays, a native solution which optimizes away calls to obj.__eq__ when lhs is rhs, and out-of-order optimizations to allow short circuiting sorted lists more quickly.
  • max
    max over 13 years
    OMG, this is 6 times faster than the set solution! (280 million elements/sec vs 45 million elements/sec on my laptop). Why??? And is there any way to modify it so that it short circuits (I guess not...)
  • max
    max over 13 years
    It is 3 times slower than the set solution on my computer, ignoring short circuit. So if the unequal element is found on average in the first third of the list, it's faster on average.
  • Ivo van der Wijk
    Ivo van der Wijk over 13 years
    I guess list.count has a highly optimized C implementation, and the length of the list is stored internally, so len() is cheap as well. There's not a way to short-circuit count() since you will need to really check all elements to get the correct count.
  • max
    max over 13 years
    Can I change it to: x.count(next(x)) == len(x) so that it works for any container x? Ahh.. nm, just saw that .count is only available for sequences.. Why isn't it implemented for other builtin containers? Is counting inside a dictionary inherently less meaningful than inside a list?
  • Ivo van der Wijk
    Ivo van der Wijk over 13 years
    An iterator may not have a length. E.g. it can be infinite or just dynamically generated. You can only find its length by converting it to a list which takes away most of the iterators advantages
  • ninjagecko
    ninjagecko about 12 years
    You don't need the extra parentheses around the generator expression if it's the only argument.
  • ninjagecko
    ninjagecko about 12 years
    for k in j: break is equivalent to next(j). You could also have done def allTheSame(x): return len(list(itertools.groupby(x))<2) if you did not care about efficiency.
  • max
    max about 12 years
    This works, but it's a bit (1.5x) slower than @KennyTM checkEqual1. I'm not sure why.
  • max
    max almost 12 years
    I'm actually trying to see if all elements in one list are identical; not if two separate lists are identical.
  • skalee
    skalee about 11 years
    This is the nicest, most readable answer from all here. Maybe not the most efficient, but it's not always that important.
  • berdario
    berdario about 10 years
    Your first code was obviously wrong: reduce(lambda a,b:a==b, [2,2,2]) yields False... I edited it, but this way it's not pretty anymore
  • ninjagecko
    ninjagecko over 8 years
    max: Likely because I did not bother to perform the optimization first=myList[0] all(x==first for x in myList), perhaps
  • Matt Liberty
    Matt Liberty over 8 years
    I think that myList[0] is evaluated with each iteration. >>> timeit.timeit('all([y == x[0] for y in x])', 'x=[1] * 4000', number=10000) 2.707076672740641 >>> timeit.timeit('x0 = x[0]; all([y == x0 for y in x])', 'x=[1] * 4000', number=10000) 2.0908854261426484
  • ninjagecko
    ninjagecko over 8 years
    I should of course clarify that the optimization first=myList[0] will throw an IndexError on an empty list, so commenters who were talking about that optimization I mentioned will have to deal with the edge-case of an empty list. However the original is fine (x==myList[0] is fine within the all because it is never evaluated if the list is empty).
  • max
    max about 8 years
    Sorry, what I meant was why count isn't implemented for iterables, not why len isn't available for iterators. The answer is probably that it's just an oversight. But it's irrelavant for us because default .count() for sequences is very slow (pure python). The reason your solution is so fast is that it relies on the C-implemented count provided by list. So I suppose whichever iterable happens to implement count method in C will benefit from your approach.
  • Henry Gomersall
    Henry Gomersall almost 8 years
    This is clearly the right way to to it. If you want speed in every case, use something like numpy.
  • musiphil
    musiphil over 7 years
    Your for loop can be made more Pythonic into if any(item != list[0] for item in list[1:]): return False, with exactly the same semantics.
  • Brendan
    Brendan over 7 years
    You can avoid a redundant comparison here by using for elem in mylist[1:]. Doubt it improves speed much though since I guess elem[0] is elem[0] so the interpreter can probably do that comparison very quickly.
  • Chen A.
    Chen A. over 6 years
    so does all(), why not use all(x == seq[0] for x in seq) ? looks more pythonic and should perform the same
  • Aaron3468
    Aaron3468 over 6 years
    @IvovanderWijk Among the faster approaches is this one: timeit.timeit('all(i == s1[0] for i in set(s1))', 's1=[1]*5000', number=10000). It clocks in at 0.39 ms on my system while the count == len clocks in at 0.16 ms.
  • Bachsau
    Bachsau about 5 years
    Needs third party module.
  • mhwombat
    mhwombat about 5 years
    This answer is identical to an answer from U9-Forward from last year.
  • Luis B
    Luis B about 5 years
    Good eye! I used the same structure/API, but my method uses np.unique and shape. U9's function uses np.all() and np.diff() -- I don't use either of those functions.
  • Josmoor98
    Josmoor98 almost 5 years
    For null lists, do you mean this will return False? I'm not able to replicate if so. If you wish to detect a null list, how would you modify the above? I have been using if myList and all(x==myList[0] for x in myList)?
  • ninjagecko
    ninjagecko almost 5 years
    @Josmoor98: I mean the empty list [], which I assume you mean too (if the list can be None, that's another issue and must be treated separately), and that all(x==y[0] for x in myList) will be true for inputs myList=... [], [1], [1,1], ['a','a','a'], [[1],[1]], [(1,),(1,)], and even strings '"zzz"'. It will be false for inputs like [1,2], or anything where two elements a,b don't return a.__eq__(b) or equivalent. [continued]
  • ninjagecko
    ninjagecko almost 5 years
    @Josmoor98: [continued] If your var can be None, be aware bool([])==False and bool(None)==False. Stylistically if were to encapsulate this answer in a function, such "are null" checks should be made outside the function e.g. def allEqual(myList): all(.....), and later on if myVar!=None and allEqual(myVar):
  • GZ0
    GZ0 over 4 years
    I think not np.any(np.diff(l)) could be a bit faster.
  • schot
    schot about 4 years
    There is a small mistake in the code as written (try [1, 2, 2]): it doesn't take the previous boolean value into account. This can be fixed by replacing x[1] == y with x[0] and x[1] == y.
  • zyy
    zyy about 4 years
    This unfortunately does not work for numpy arrays.
  • Chris_Rands
    Chris_Rands about 4 years
    return next(g, f := next(g, g)) == f (from py3.8 of course)
  • teichert
    teichert almost 4 years
    In case the intuition on checkEqual3 wasn't immediately obvious to anyone else: all items are the same if first == second and second == third and ...
  • Boris Verkhovskiy
    Boris Verkhovskiy over 3 years
    This returns False if your list is empty, when it should return True. It also requires all your elements to be hashable.
  • Martijn Pieters
    Martijn Pieters over 3 years
    This is also enormously inefficient; for an input of length N it takes N^2 steps.**At the very least**, if the values are hashable, use a set for the containment tests.
  • Yaakov Bressler
    Yaakov Bressler over 3 years
    This is brilliantly fast. Clever approach too. Helpful suggestion for those implementing with pytest, you should return the result before asserting True/False. (Wrap this function in another.)
  • ChaimG
    ChaimG over 3 years
    @Boris: What is the code for these charts?
  • Boris Verkhovskiy
    Boris Verkhovskiy over 3 years
    @ChaimG if you click on "Edit", the code is hidden in a comment in the text of the answer.
  • Toothless204
    Toothless204 about 3 years
    You could make a further improvement to the version without groupby by using eq from the operator module. These functions will be slightly faster than writing it yourself because they are implemented in C within the interpreter.
  • Boris Verkhovskiy
    Boris Verkhovskiy about 3 years
    How is your answer different from stackoverflow.com/a/23415761
  • mykhal
    mykhal almost 3 years
    I wouldn't be afraid of empty lists… can you say that no (zero count) (non)values are different to each other?
  • Kermit
    Kermit almost 3 years
    Do you have an example?
  • greybeard
    greybeard almost 3 years
    (Almost useful, but lacks a docstring: While the name is mnemonic, I like to check the hover in my IDE, e.g.: all_eq([])?)
  • mykhal
    mykhal almost 3 years
    @greybeard sorry, it's not an official package
  • greybeard
    greybeard almost 3 years
    (You write "undocumented" code? Didn't work for me.)
  • Stanislav Volodarskiy
    Stanislav Volodarskiy over 2 years
    In all_equal you can use first = next(iterator, None) to avoid try/except construction.
  • jacktrader
    jacktrader over 2 years
    I don't think an empty list should return True at all, False for that use case is completely expected.
  • greybeard
    greybeard over 2 years
    Is comparison slower than slicing (== slower than 1:)? all(map(lambda x: x == lst[0], lst))
  • jacktrader
    jacktrader over 2 years
    Good catch, I glanced at the command and obviously hadn't thought it through all the way.. It splits the list and then reverses to compare (I had only saw the reverse). So it might work for all cases, I'm not sure? But I just removed it until I can test some use cases. But the logic is also hard to follow for large cases, so I'm not sure I'd use it anyways. Thanks!
  • Johannes Schaub - litb
    Johannes Schaub - litb over 2 years
    I prefer pairwise comparison with zip(lst, lst[1:])
  • Hans Bouwmeester
    Hans Bouwmeester about 2 years
    -1. These deliberations do not add much to, for example, @ninjagecko's answer from way back in 2012: all(x==myList[0] for x in myList). The top proposed solution here is similar but less performant and harder to comprehend.