How to remove duplicates from Python list and keep order?

144,598

Solution 1

A list can be sorted and deduplicated using built-in functions:

myList = sorted(set(myList))
  • set is a built-in function for Python >= 2.3
  • sorted is a built-in function for Python >= 2.4

Solution 2

If your input is already sorted, then there may be a simpler way to do it:

from operator import itemgetter
from itertools import groupby
unique_list = list(map(itemgetter(0), groupby(yourList)))

Solution 3

If you want to keep order of the original list, just use OrderedDict with None as values.

In Python2:

    from collections import OrderedDict
    from itertools import izip, repeat
    unique_list = list(OrderedDict(izip(my_list, repeat(None))))

In Python3 it's even simpler:

    from collections import OrderedDict
    from itertools import repeat
    unique_list = list(OrderedDict(zip(my_list, repeat(None))))

If you don't like iterators (zip and repeat) you can use a generator (works both in 2 & 3):

    from collections import OrderedDict
    unique_list = list(OrderedDict((element, None) for element in my_list))

Solution 4

If it's clarity you're after, rather than speed, I think this is very clear:

def sortAndUniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  output.sort()
  return output

It's O(n^2) though, with the repeated use of not in for each element of the input list.

Solution 5

> but I don't know how to retrieve the list members from the hash in alphabetical order.

Not really your main question, but for future reference Rod's answer using sorted can be used for traversing a dict's keys in sorted order:

for key in sorted(my_dict.keys()):
   print key, my_dict[key]
   ...

and also because tuple's are ordered by the first member of the tuple, you can do the same with items:

for key, val in sorted(my_dict.items()):
    print key, val
    ...
Share:
144,598

Related videos on Youtube

Josh Glover
Author by

Josh Glover

A Lisp fanatic hacking Clojure in Stockholm.

Updated on September 11, 2020

Comments

  • Josh Glover
    Josh Glover over 2 years

    Given a list of strings, I want to sort it alphabetically and remove duplicates. I know I can do this:

    from sets import Set
    [...]
    myHash = Set(myList)
    

    but I don't know how to retrieve the list members from the hash in alphabetical order.

    I'm not married to the hash, so any way to accomplish this will work. Also, performance is not an issue, so I'd prefer a solution that is expressed in code clearly to a fast but more opaque one.

    • Mark Amery
      Mark Amery over 3 years
      This question, after @ColonelPanic's edit, is kind of a mess; the question in the title and the question in the body are not the same. The title indicates that the original order, pre-duplicate-removal, should be preserved. But the body presents a scenario where this is not in fact necessary.
  • leo7r
    leo7r almost 14 years
    This can also be expressed as [e for e, _ in groupby(sortedList)]
  • J_Zar
    J_Zar about 10 years
    This does not work if your myList has unhashable objects.
  • Colonel Panic
    Colonel Panic about 7 years
    This is O(n) rather than O(n log n) right?
  • Cristian Ciupitu
    Cristian Ciupitu almost 7 years
    FWIW something similar was added to the list of recipes from the documentation for itertools.
  • Zuzu Corneliu
    Zuzu Corneliu almost 6 years
    wouldn't set(sorted(myList)) be faster? I mean isn't it faster to first sort a list and then remove its duplicates than first removing its duplicates and only sort it afterwards?
  • Dimali
    Dimali over 5 years
    @CorneliuZuzu Removing duplicates with set() changes order, so you have to do it this way
  • Arthur Tacca
    Arthur Tacca over 4 years
    @CorneliuZuzu set uses a hash table rather than a tree or sorted list internally, so its speed is unaffected by whether the elements are already sorted. This also means that the items end up in hash order, which isn't the same as sorted order (as Dimali said).
  • Ken Seehart
    Ken Seehart about 3 years
    Downvoted because there is a distinction between ordered and sorted. Ordered means keep the original order, e.g. f([3,1,4,1,5,9,2,6,5,3,5]) = [3,1,4,5,9,2,6]
  • Rod Daunoravicius
    Rod Daunoravicius about 3 years
    @user3667349 The "keep order" clause was not a part of the original question, it was added by the Colonel Panic edit in 2015.
  • thinkingmonster
    thinkingmonster about 1 year
    @ZuzuCorneliu set does not maintain the order so set after sorted will again make the list unordered.
  • Zuzu Corneliu
    Zuzu Corneliu about 1 year
    Ah yes yes, you are correct guys, it slipped my logic at the time ;)

Related