Pandas dataframe as input for matplotlib.pyplot.boxplot

15,293

It's not clear that your data are in a DataFrame. It appears to be a list of Series objects.

Once it's really in a DataFrame, the trick here is the create your figure and axes ahead of time and use the **kwargs that you would normally use with matplotlib.axes.boxplot. You also need to make sure that your data is a DataFrame and not a Series

import numpy as np
import matplotlib.pyplot as plt
import pandas

fig, ax = plt.subplots()
df = pandas.DataFrame(np.random.normal(size=(37,5)), columns=list('ABCDE'))
df.boxplot(ax=ax, positions=[2,3,4,6,8], notch=True, bootstrap=5000)
ax.set_xticks(range(10))
ax.set_xticklabels(range(10))
plt.show()

Which gives me:boxplots

Failing that, you can take a similar approach, looping through the columns you would like to plot using your ax object directly.

import numpy as np
import matplotlib.pyplot as plt
import pandas

df = pandas.DataFrame(np.random.normal(size=(37,5)), columns=list('ABCDE'))
fig, ax = plt.subplots()
for n, col in enumerate(df.columns):
    ax.boxplot(df[col], positions=[n+1], notch=True)

ax.set_xticks(range(10))
ax.set_xticklabels(range(10))
plt.show()

Which gives: more boxplots

Share:
15,293
TheChymera
Author by

TheChymera

Updated on July 31, 2022

Comments

  • TheChymera
    TheChymera over 1 year

    I have a pandas dataframe which looks like this:

    [('1975801_m', 1      0.203244
    10    -0.159756
    16    -0.172756
    19    -0.089756
    20    -0.033756
    23    -0.011756
    24     0.177244
    32     0.138244
    35    -0.104756
    36     0.157244
    40     0.108244
    41     0.032244
    42     0.063244
    45     0.362244
    59    -0.093756
    62    -0.070756
    65    -0.030756
    66    -0.100756
    73    -0.140756
    77    -0.110756
    81    -0.100756
    84    -0.090756
    86    -0.180756
    87     0.119244
    88     0.709244
    102   -0.030756
    105   -0.000756
    107   -0.010756
    109    0.039244
    111    0.059244
    Name: RTdiff), ('3878418_m', 1637    0.13811
    1638   -0.21489
    1644   -0.15989
    1657   -0.11189
    1662   -0.03289
    1666   -0.09489
    1669    0.03411
    1675   -0.00489
    1676    0.03511
    1677    0.39711
    1678   -0.02289
    1679   -0.05489
    1681   -0.01989
    1691    0.14411
    1697   -0.10589
    1699    0.09411
    1705    0.01411
    1711   -0.12589
    1713    0.04411
    1715    0.04411
    1716    0.01411
    1731    0.06411
    1738   -0.25589
    1741   -0.21589
    1745    0.39411
    1746   -0.13589
    1747   -0.10589
    1748    0.08411
    Name: RTdiff)
    

    I would like to use it as input for the mtplotlib.pyplot.boxplot function.

    the error I get from matplotlib.pyplot.boxplot(mydataframe) is ValueError: cannot set an array element with a sequence

    I tried to use list(mydataframe) instead of mydataframe. That fails with the same error.

    I also tried matplotlib.pyplot.boxplot(np.fromiter(mydataframe, np.float)) - that fails with ValueError: setting an array element with a sequence.

    • Paul H
      Paul H about 11 years
      pandas dataframes have their own boxplot method (i.e. mydataframe.boxplot()). Does that get you where you need to be?
    • TheChymera
      TheChymera about 11 years
      I would like to plot them alongside something else, the pandas boxplot function creates a new figure for each boxplot set. also, apparently it won't let me customize color or position.
    • Paul H
      Paul H about 11 years
      It's tricky, but you can do it. See my reponse