How to make a bar plot of non-numerical data in pandas
Solution 1
To bin
your data, take a look at pandas.cut()
see docs. For categorical plots, I've found the seaborns
package quite helpful - see the tutorial on categorical plots. Below an example for a plot of the yes/no counts for the bins you mention using a random sample:
df = pd.DataFrame(data={"age": randint(10, 50, 1000),
"response": [choice(['Yes', 'No']) for i in range(1000)]})
df['age_group'] = pd.cut(df.age, bins=[g for g in range(10, 51, 5)], include_lowest=True)
df.head()
age response age_group
0 48 Yes (45, 50]
1 31 No (30, 35]
2 25 Yes (20, 25]
3 29 Yes (25, 30]
4 19 Yes (15, 20]
import seaborn as sns
sns.countplot(y='response', hue='age_group', data=df, palette="Greens_d")
Solution 2
To generate a multiple bar plot, you would first need to group by age and response and then unstack the dataframe:
df=df.groupby(['age','response']).size()
df=df.unstack()
df.plot(kind='bar')
Here is the output plot:
Jean Nassar
Updated on June 14, 2022Comments
-
Jean Nassar almost 2 years
Suppose I had this data:
>>> df = pd.DataFrame(data={"age": [11, 12, 11, 11, 13, 11, 12, 11], "response": ["Yes", "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes"]}) >>> df age response 0 11 Yes 1 12 No 2 11 Yes 3 11 Yes 4 13 Yes 5 11 No 6 12 Yes 7 11 Yes
I would like to make a bar plot that shows the yes or no responses aggregated by age. Would it be possible at all? I have tried
hist
andkind=bar
, but neither was able to sort by age, instead graphing both age and response separately.It would look like this:
^ 4 | o 3 | o 2 | o 1 | ox ox o 0 .-----------------------> 11 12 13
where
o
is "Yes", andx
is "No".Also, would it be possible to make the numbers grouped? If you had a range from 11 to 50, for instance, you might be able to put it in 5-year bins. Also, would it be possible to show percentages or counts on the axis or on the individual bar?
-
Jean Nassar over 8 yearsI get a
TypeError: Empty 'DataFrame': no numeric data to plot
. However, df itself is not empty. -
Jean Nassar over 8 yearsIt works! Thanks! You just need to add
df =
before thedf.groupby
. Also, I got aFutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison return np.sum(name == np.asarray(self.names)) > 1
. But that is a Pandas operation. Should I submit an issue, or is there something I can do about that myself? -
Learner over 8 yearsLooks like you are using an older release of Pandas, nothing to worry about, it is fixed in the forthcoming release.
-
Jean Nassar over 8 yearsThis is amazing. Thank you. I used
sns.countplot(x='age_group', hue='response', data=df.sort("response"), palette="Greens_d")
. -
Jean Nassar over 8 yearsAlso, the name of the package was
seaborn
, notseaborns
. -
Stefan over 8 yearsTypo fixed. Looks like this addresses both binning and potting aspects of your question.