Matplotlib - Boxplot calculated on log10 values but shown in logarithmic scale

15,315

Solution 1

I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).

Instead, you can plot the logarithm of the data and manually adjust the y-labels.

Here is a very crude example:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter

np.random.seed(42)

values = 10 ** np.random.uniform(-3, 3, size=100)

fig = plt.figure(figsize=(9, 3))


ax = plt.subplot(1, 3, 1)

ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')

ax = plt.subplot(1, 3, 2)

ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')

ax = plt.subplot(1, 3, 3)

ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')

plt.show()

results

The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.

Alternatively, you could specify to draw the whiskers for a given percentile range: ax.boxplot(values, whis=[5, 95]) In this case you get a fixed amount of outlires (5%) above and below.

Solution 2

You can use plt.yscale:

plt.boxplot(data); plt.yscale('log')
Share:
15,315
Frank
Author by

Frank

Always eager to learn more programming, and currently improving my Java and Python skills.

Updated on July 20, 2022

Comments

  • Frank
    Frank almost 2 years

    I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values. I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.

    So my question is basically this: When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values? I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.

    Or is there another more direct way to plotting this. A different package maybe that has this options already included?

    Many thanks for the help.

  • Frank
    Frank over 8 years
    Thank you for the nice example. Is there a way to add also minor ticks for the log plot as they are in the raw plot?
  • MB-F
    MB-F over 8 years
    I don't know, sorry. Maybe it's possible with matplotlib.ticker: matplotlib.org/examples/pylab_examples/major_minor_demo1.htm‌​l
  • fabiocapsouza
    fabiocapsouza over 3 years
    I could set minor ticks following a similar logic of the major ticks. For example, to set minor ticks at positions 1, 2, ..., 9, 20, 30, ..., 90, compute their log10 and set as minor ticks: minor_xticks = np.log10(np.concatenate((np.arange(1, 10), np.arange(1, 10) * 10)).astype(np.float)) ax.set_xticks(minor_xticks, minor=True)