How to generate a word frequency histogram, where bars are ordered according to their height
You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar
; below I use numpy.argsort
for that. The plot then looks as follows (I also added the labels to the bar):
Here is the code that produces the plot with a few inline comments:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']
counts = Counter(word_list)
labels, values = zip(*counts.items())
# sort your values in descending order
indSort = np.argsort(values)[::-1]
# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]
indexes = np.arange(len(labels))
bar_width = 0.35
plt.bar(indexes, values)
# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()
In case you want to plot only the first n
entries, you can replace the line
counts = Counter(word_list)
by
counts = dict(Counter(word_list).most_common(n))
In the case above, counts
would then be
{'A': 3, 'C': 4}
for n = 2
.
If you like to remove the frame of the plot and label the bars directly, you can check this post.
Comments
-
BKS almost 2 years
I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:
import csv from collections import Counter import numpy as np word_list = ['A','A','B','B','A','C','C','C','C'] counts = Counter(merged) labels, values = zip(*counts.items()) indexes = np.arange(len(labels)) plt.bar(indexes, values) plt.show()
It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print
counts
it orders them for meCounter({'C': 4, 'A': 3, 'B': 2})
. How could I achieve that? -
Admin over 6 yearsI have more than 4000 words to count, so how to generate word frequency histogram of only top 20 words?
-
Cleb over 6 years@AAKM: You can use
counts.most_common(20)
i.e.counts = Counter(word_list).most_common(20)
. -
Admin over 6 yearsAttributeError Traceback (most recent call last) <ipython-input-33-704ddcc6ce26> in <module>() 5 counts = Counter(df['Text']).most_common(10) 6 ----> 7 labels, values = zip(*counts.items()) 8 9 # sort your values in descending order AttributeError: 'list' object has no attribute 'items'
-
Cleb over 6 years@AAKM: True,
most_common
returns a list, not a dictionary, I updated the post. So,dict(Counter(word_list).most_common(20))
should work for you now.