how to efficiently get the k bigger elements of a list in python
Solution 1
Use nlargest from heapq module
from heapq import nlargest
lst = [9,1,6,4,2,8,3,7,5]
nlargest(3, lst) # Gives [9,8,7]
You can also give a key to nlargest in case you wanna change your criteria:
from heapq import nlargest
tags = [ ("python", 30), ("ruby", 25), ("c++", 50), ("lisp", 20) ]
nlargest(2, tags, key=lambda e:e[1]) # Gives [ ("c++", 50), ("python", 30) ]
Solution 2
The simple, O(n log n) way is to sort the list then get the last k elements.
The proper way is to use a selection algorithm, which runs in O(n + k log k) time.
Also, heapq.nlargest
takes O(n log k) time on average, which may or may not be good enough.
(If k = O(n), then all 3 algorithms have the same complexity (i.e. don't bother). If k = O(log n), then the selection algorithm as described in Wikipedia is O(n) and heapq.nlargest
is O(n log log n), but double logarithm is "constant enough" for most practical n that it doesn't matter.)
Solution 3
l = [9,1,6,4,2,8,3,7,5]
sorted(l)[-k:]
Solution 4
You can use the heapq
module.
>>> from heapq import heapify, nlargest
>>> l = [9,1,6,4,2,8,3,7,5]
>>> heapify(l)
>>> nlargest(3, l)
[9, 8, 7]
>>>
Solution 5
sorted(l, reverse=True)[:k]
Comments
-
Manuel Araoz almost 2 years
What´s the most efficient, elegant and pythonic way of solving this problem?
Given a list (or set or whatever) of n elements, we want to get the k biggest ones. ( You can assume
k<n/2
without loss of generality, I guess) For example, if the list were:l = [9,1,6,4,2,8,3,7,5]
n = 9, and let's say k = 3. What's the most efficient algorithm for retrieving the 3 biggest ones? In this case we should get
[9,8,7]
, in no particular order.Thanks! Manuel
-
garg10may over 8 yearswe don't need to heapify it here
-
Paul Evans over 8 yearsGreat language ranking :) )
-
Vikas Prasad about 3 yearsis it guaranteed that
nlargest()
would return items in the sorted order as mentioned in the above examples? couldn't find this in doc, so I guess, there is no such guarantee -
Vikas Prasad about 3 years^ okay, I just read the source code, it seems the output would always be sorted. Nice!
-
Vikas Prasad about 3 yearsfrom the link for nlargest in the answer, the time complexity seems to be O(n log(k)) and not O(k log(n))
-
Admin almost 2 yearssorted(heap) -> O(n logn) complexity VS heapify(heap) -> O(n) + heapq.nlargest(k, heap) -> O(k logn) It's better to use the latter case
-
Admin almost 2 yearsIt's klogn, not nlogk, you binary search n elements (O(logn)) k times => O(k logn)