How to make a bar plot of non-numerical data in pandas

19,691

Solution 1

To bin your data, take a look at pandas.cut() see docs. For categorical plots, I've found the seaborns package quite helpful - see the tutorial on categorical plots. Below an example for a plot of the yes/no counts for the bins you mention using a random sample:

df = pd.DataFrame(data={"age": randint(10, 50, 1000),
                    "response": [choice(['Yes', 'No']) for i in range(1000)]})

df['age_group'] = pd.cut(df.age, bins=[g for g in range(10, 51, 5)], include_lowest=True)
df.head()

   age response age_group
0   48      Yes  (45, 50]
1   31       No  (30, 35]
2   25      Yes  (20, 25]
3   29      Yes  (25, 30]
4   19      Yes  (15, 20]

import seaborn as sns
sns.countplot(y='response', hue='age_group', data=df, palette="Greens_d")

enter image description here

Solution 2

To generate a multiple bar plot, you would first need to group by age and response and then unstack the dataframe:

df=df.groupby(['age','response']).size()
df=df.unstack()
df.plot(kind='bar')

Here is the output plot:

Bar plot

Share:
19,691
Jean Nassar
Author by

Jean Nassar

Updated on June 14, 2022

Comments

  • Jean Nassar
    Jean Nassar almost 2 years

    Suppose I had this data:

    >>> df = pd.DataFrame(data={"age": [11, 12, 11, 11, 13, 11, 12, 11],
                            "response": ["Yes", "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes"]})
    >>> df
        age response
    0   11  Yes
    1   12  No
    2   11  Yes
    3   11  Yes
    4   13  Yes
    5   11  No
    6   12  Yes
    7   11  Yes
    

    I would like to make a bar plot that shows the yes or no responses aggregated by age. Would it be possible at all? I have tried hist and kind=bar, but neither was able to sort by age, instead graphing both age and response separately.

    It would look like this:

      ^
    4 |   o
    3 |   o
    2 |   o
    1 |   ox      ox      o
    0 .----------------------->
          11      12      13  
    

    where o is "Yes", and x is "No".

    Also, would it be possible to make the numbers grouped? If you had a range from 11 to 50, for instance, you might be able to put it in 5-year bins. Also, would it be possible to show percentages or counts on the axis or on the individual bar?

  • Jean Nassar
    Jean Nassar over 8 years
    I get a TypeError: Empty 'DataFrame': no numeric data to plot. However, df itself is not empty.
  • Jean Nassar
    Jean Nassar over 8 years
    It works! Thanks! You just need to add df = before the df.groupby. Also, I got a FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison return np.sum(name == np.asarray(self.names)) > 1. But that is a Pandas operation. Should I submit an issue, or is there something I can do about that myself?
  • Learner
    Learner over 8 years
    Looks like you are using an older release of Pandas, nothing to worry about, it is fixed in the forthcoming release.
  • Jean Nassar
    Jean Nassar over 8 years
    This is amazing. Thank you. I used sns.countplot(x='age_group', hue='response', data=df.sort("response"), palette="Greens_d").
  • Jean Nassar
    Jean Nassar over 8 years
    Also, the name of the package was seaborn, not seaborns.
  • Stefan
    Stefan over 8 years
    Typo fixed. Looks like this addresses both binning and potting aspects of your question.