Sorting a list of tuples with multiple conditions

13,871

Solution 1

The key function can return a tuple.

sorted_by_length = sorted(list_,
                         key=lambda x: (x[0], len(x[1]), float(x[1])))

This works because tuples are sorted lexicographically: (the first element of the tuple is used for sorting first, then the second element is used for breaking ties, and then the third element is used for breaking any remaining ties.)

See the excellent HOWTO Sort for an explanation of this and other issues related to sorting.


In [1]: list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [2]: sorted_by_length = sorted(list_,
                         key=lambda x: (x[0], len(x[1]), float(x[1])))
   ...: 
In [3]: sorted_by_length
Out[3]: [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]

If the second element of each tuple is the string representation of an int in binary, then use int(x, 2) instead of float(x) in the sort key. If they are intended to be the decimal representation of an integer, then use int(x).

Solution 2

You can sort using for key function that return collection as result

list_.sort(key=lambda x: [x[0], len(x[1]), x[1]])

key parameter to specify a function to be called on each list element prior to making comparisons.

If You use collection as key result then It will be sorted using first comparing first elements if they are equal then seconds ones will be compared and so on...

P.S. As I understand It is not necessary to cast third item to numeric type because if the equal is the same then for binary values lexicographical and numeric ordering will give same result

Solution 3

The right solution is to use a key function that returns a tuple, as shown in unutbu's answer. However there is an other way of doing it. Python's sort is guaranteed to be stable, so you can do multiple sorts by different keys and achieve the output you want. In particular:

list_.sort(key=lambda x: float(x[1]))
list_.sort(key=lambda x: len(x[1]))
list_.sort(key=lambda x: x[0])

Demo with IPython:

In [1]: list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [2]: list_.sort(key=lambda x: float(x[1]))
   ...: list_.sort(key=lambda x: len(x[1]))
   ...: list_.sort(key=lambda x: x[0])
   ...: 

In [3]: list_
Out[3]: [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]

Note: this solution resembles the three steps you described in your question but the steps are reversed! Sort by the primary key last to get the correct output.

Also keep in mind that the algorithm used for sorting is adaptive. this means that when a sequence is already partially sorted it can use the partial order to sort more efficiently(often in linear time instead of nlog(n)). When you sort by multiple keys you often achieve this partial order, hence the multiple calls to sort() do not cost much. However it highly depends on the keys and the data. Sometimes it is more efficient than using tuples as keys, sometimes it's quite slower.


An example of timing. Note that the two solutions take mostly the same time.

In [9]: list_
Out[9]: [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]

In [10]: list_ *= 1000   # better to avoid too small benchmarks.

In [11]: %%timeit
    ...: a = sorted(list_, key=lambda x: (x[0], len(x[1]), float(x[1])))
    ...: 
100 loops, best of 3: 6.04 ms per loop

In [12]: %%timeit
    ...: a = sorted(list_, key=lambda x: float(x[1]))
    ...: a.sort(key=lambda x: len(x[1]))
    ...: a.sort(key=lambda x: x[0])
    ...: 
100 loops, best of 3: 5.72 ms per loop
In [13]: import random
    ...: data = [(random.randint(1, 1000), bin(random.randint(1, 100))[2:]) for _ in range(10000)]
    ...: 

In [14]: %%timeit
    ...: a = sorted(data, key=lambda x: (x[0], len(x[1]), float(x[1])))
    ...: 
100 loops, best of 3: 15.2 ms per loop

In [15]: %%timeit
    ...: a = sorted(data, key=lambda x: float(x[1]))
    ...: a.sort(key=lambda x: len(x[1]))
    ...: a.sort(key=lambda x: x[0])
    ...: 
100 loops, best of 3: 15.1 ms per loop
Share:
13,871

Related videos on Youtube

Marius Küpper
Author by

Marius Küpper

Updated on June 04, 2022

Comments

  • Marius Küpper
    Marius Küpper almost 2 years

    I am currently trying to sort the following list:

    list_ = [(1, '0101'), (1, '1010'), (1, '101'), (2, '01'), (2, '010'), (2, '10')]
    

    These are the steps I want to take in order to sort it:

    1. Sort the list by the value of the first element of the tuples
    2. Next, sort the list by the length of the second element of the tuples (not the value, the length!) AFTER step 1 finishes.
    3. Next, sort the list by the value of the second element of the tuples AFTER step 1 and step 2 finishes.

    My attempt:

    sorted_by_length = sorted(list_, key=len x:x[1])
    

    However, I received a syntax error concerning the x after key= len. What is the right variable I should be using in this case?

    The correct, sorted list should be:

    sorted_by_length = [(1, '101'), (1, '0101'), (1, '1010'), (2, '01'), (2, '10'), (2, '010')]
    

    Thank you for help.

    • matth
      matth over 10 years
      What do you mean value of of the second element? Do you want that step to be lexicographical? Or should that string be interpreted as a binary integer value?
  • Steven Rumbalski
    Steven Rumbalski over 10 years
    Regarding "It is not necessary to cast third item to numeric type ...", lexicographical and numeric ordering will not be the same because the the strings are of unequal length. For example '10111' would sort as less than '11'.
  • oleg
    oleg over 10 years
    yes but length has bigger priority so 11 should be before 10111 anyway. the only thing that doesn't fit this rule is negative numbers as I understand.
  • Steven Rumbalski
    Steven Rumbalski over 10 years
    Thanks, I missed the implications of the length comparison. It does seem odd to me that '010' would sort as greater than '11'. Perhaps what the OP really needs is key=lambda x:(x[0], bin(x[1], 2))).
  • Marius Küpper
    Marius Küpper over 10 years
    it works! and so simple ! i was before trying for hours to sort it just with for in range and while ,... Thank you all for the fast answers !! :)
  • Marius Küpper
    Marius Küpper over 10 years
    it works! and so simple ! i was before trying for hours to sort it just with for in range and while ,... Thank you all for the fast answers !! :)