How to generate a word frequency histogram, where bars are ordered according to their height

24,750

You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar; below I use numpy.argsort for that. The plot then looks as follows (I also added the labels to the bar):

enter image description here

Here is the code that produces the plot with a few inline comments:

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']

counts = Counter(word_list)

labels, values = zip(*counts.items())

# sort your values in descending order
indSort = np.argsort(values)[::-1]

# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]

indexes = np.arange(len(labels))

bar_width = 0.35

plt.bar(indexes, values)

# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()

In case you want to plot only the first n entries, you can replace the line

counts = Counter(word_list)

by

counts = dict(Counter(word_list).most_common(n))

In the case above, counts would then be

{'A': 3, 'C': 4}

for n = 2.

If you like to remove the frame of the plot and label the bars directly, you can check this post.

Share:
24,750
BKS
Author by

BKS

Love to experiment with new programming languages.

Updated on July 09, 2022

Comments

  • BKS
    BKS almost 2 years

    I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:

    import csv
    from collections import Counter
    import numpy as np
    
    word_list = ['A','A','B','B','A','C','C','C','C']
    
    counts = Counter(merged)
    
    labels, values = zip(*counts.items())
    
    indexes = np.arange(len(labels))
    
    plt.bar(indexes, values)
    plt.show()
    

    It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts it orders them for me Counter({'C': 4, 'A': 3, 'B': 2}). How could I achieve that?

  • Admin
    Admin over 6 years
    I have more than 4000 words to count, so how to generate word frequency histogram of only top 20 words?
  • Cleb
    Cleb over 6 years
    @AAKM: You can use counts.most_common(20) i.e. counts = Counter(word_list).most_common(20).
  • Admin
    Admin over 6 years
    AttributeError Traceback (most recent call last) <ipython-input-33-704ddcc6ce26> in <module>() 5 counts = Counter(df['Text']).most_common(10) 6 ----> 7 labels, values = zip(*counts.items()) 8 9 # sort your values in descending order AttributeError: 'list' object has no attribute 'items'
  • Cleb
    Cleb over 6 years
    @AAKM: True, most_common returns a list, not a dictionary, I updated the post. So, dict(Counter(word_list).most_common(20)) should work for you now.