Item frequency count in Python
Solution 1
The Counter
class in the collections
module is purpose built to solve this type of problem:
from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
Solution 2
defaultdict to the rescue!
from collections import defaultdict
words = "apple banana apple strawberry banana lemon"
d = defaultdict(int)
for word in words.split():
d[word] += 1
This runs in O(n).
Solution 3
freqs = {}
for word in words:
freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize
I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.
This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.
Solution 4
Standard approach:
from collections import defaultdict
words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
result[word] += 1
print result
Groupby oneliner:
from itertools import groupby
words = "apple banana apple strawberry banana lemon"
words = words.split()
result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result
Solution 5
If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:
>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]
It runs in O(n log n) time.
Daniyar
BEng Computer Science and Technology @ Tsinghua University, 2009. MSc Computer Graphics, Vision and Imaging @ University College London, 2011. PhD Computer Vision @ University College London, 2016.
Updated on October 16, 2021Comments
-
Daniyar over 2 years
Assume I have a list of words, and I want to find the number of times each word appears in that list.
An obvious way to do this is:
words = "apple banana apple strawberry banana lemon" uniques = set(words.split()) freqs = [(item, words.split().count(item)) for item in uniques] print(freqs)
But I find this code not very good, because the program runs through the word list twice, once to build the set, and a second time to count the number of appearances.
Of course, I could write a function to run through the list and do the counting, but that wouldn't be so Pythonic. So, is there a more efficient and Pythonic way?
-
Daniyar about 15 yearsIs there a difference in complexity? Does groupby use sorting? Then it seems to need O(nlogn) time?
-
nosklo about 15 yearsSeems slower than defaultdict in my tests
-
Kenan Banks about 15 yearssplitting by a space is redundant. Also, you should use the dict.set_default method instead of the try/except.
-
hopla about 15 yearsIt's a lot slower because you are using Exceptions. Exceptions are very costly in almost any language. Avoid using them for logic branches. Look at my solution for an almost identical method, but without using Exceptions: stackoverflow.com/questions/893417/…
-
Daniyar about 13 yearsThe question already uses "count", and asks for better alternatives.
-
JDong over 9 yearsAccording to stackoverflow.com/a/20308657/2534876, this is fastest on Python3 but slow on Python2.
-
Tommy almost 9 yearsdo you know if there is a flag to convert this to a percentage freq_dict? E.g.,
'apple' : .3333 (2/6),
-
Boris Verkhovskiy about 5 years@Tommy
total = sum(your_counter_object.values())
thenfreq_percentage = {k: v/total for k, v in your_counter_object.items()}
-
Boris Verkhovskiy about 5 yearsThis is a very old answer. Use
Counter
instead.