How to generate a word frequency histogram, where bars are ordered according to their height

python python-2.7 matplotlib histogram ranking

24,750

You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar; below I use numpy.argsort for that. The plot then looks as follows (I also added the labels to the bar):

Here is the code that produces the plot with a few inline comments:

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']

counts = Counter(word_list)

labels, values = zip(*counts.items())

# sort your values in descending order
indSort = np.argsort(values)[::-1]

# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]

indexes = np.arange(len(labels))

bar_width = 0.35

plt.bar(indexes, values)

# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()

In case you want to plot only the first n entries, you can replace the line

counts = Counter(word_list)

counts = dict(Counter(word_list).most_common(n))

In the case above, counts would then be

{'A': 3, 'C': 4}

for n = 2.

If you like to remove the frame of the plot and label the bars directly, you can check this post.

24,750

Author by

BKS

Love to experiment with new programming languages.

Updated on July 09, 2022

Comments

BKS almost 2 years
I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:
```
import csv
from collections import Counter
import numpy as np

word_list = ['A','A','B','B','A','C','C','C','C']

counts = Counter(merged)

labels, values = zip(*counts.items())

indexes = np.arange(len(labels))

plt.bar(indexes, values)
plt.show()
```
It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts it orders them for me Counter({'C': 4, 'A': 3, 'B': 2}). How could I achieve that?
Admin over 6 years

I have more than 4000 words to count, so how to generate word frequency histogram of only top 20 words?
Cleb over 6 years

@AAKM: You can use counts.most_common(20) i.e. counts = Counter(word_list).most_common(20).
Admin over 6 years

AttributeError Traceback (most recent call last) <ipython-input-33-704ddcc6ce26> in <module>() 5 counts = Counter(df['Text']).most_common(10) 6 ----> 7 labels, values = zip(*counts.items()) 8 9 # sort your values in descending order AttributeError: 'list' object has no attribute 'items'
Cleb over 6 years

@AAKM: True, most_common returns a list, not a dictionary, I updated the post. So, dict(Counter(word_list).most_common(20)) should work for you now.