Plotting two histograms from a pandas DataFrame in one subplot using matplotlib

25,494

Solution 1

I don't know if I understood your question correctly, but something like this can combine the plots. You might want to play around a little with the alpha and change the headers.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame({'a_wood'       : np.random.randn(100),
                   'a_grassland'  : np.random.randn(100),
                   'a_settlement' : np.random.randn(100),
                   'b_wood'       : np.random.randn(100),
                   'b_grassland'  : np.random.randn(100),
                   'b_settlement' : np.random.randn(100)})

fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))
n = 3
n_bins = 12

for i in range(n):
    min_value = df.iloc[:,[i,i+n]].min().min() #Get minimum value of column pairs, e.g. column 0 (a_wood) and column 3 (b_wood)
    max_value = df.iloc[:,[i,i+n]].max().max() #Get maximum value of column pairs
    bins = np.linspace(min_value, max_value, n_bins) #Create bins of equal size between min_value and max_value
    
    df.hist(column=df.columns[i], bins=bins, ax=ax[i], alpha=0.5, color='red')
    df.hist(column=df.columns[i+n], bins=bins, ax=ax[i], alpha=0.5, color='blue')
    ax[i].set_title(df.columns[i][2:])

Histogram with columns overlapping

To plot them both next to eachother, try this:

#We do not have to specify the bins in this example
fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))

n = 3
colors = ['red', 'blue']

axes = ax.flatten()
for i,j in zip(range(n), axes):
    j.hist([df.iloc[:,i], df.iloc[:,i+n]], bins=12, color=colors)
    j.set_title(df.columns[i][2:])

Histogram with columns next to eachother

Solution 2

you want something that loops through each column and plot its data in histogram, right? I can suggest you to make few modifications that you can re-use in future code, before giving the code there are few useful tips that are helpful,

  1. One must be aware that dataframes have attribute that can be used to loop through, for instance, the attribute .columns let have the list of columns
  2. Also when plotting, I noticed that using directly the coordinates on the grid won't let your code be adaptable, so you need to 'flatten' your grid coordinates, hence the use of ax.ravel() which enable this.
  3. enumerate() is always useful to loop through an object while making available the ith element and its index at the same time.
  4. Understanding subplots in python is tricky at the beginning, so reading other people code is really helpful, I strongly advise you look at the plot done in the exemples for scikit functions (it helped a lot)

here is my code proposal :

fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(12,7))
ax = ax.ravel() 
# this method helps you to go from a 2x3 array coordinates to 
# 1x6 array, it will be helpful to use as below

for idx in range(3):
    ax[idx].hist(df.iloc[:,idx], bins=12, alpha=0.5)
    ax[idx].hist(df.iloc[:,idx+3], bins=12, alpha=0.5)
    ax[idx].set_title(df.columns[idx]+' with '+df.columns[idx+3])
    ax[idx].legend(loc='upper left')

the result looks like this

I hope this is helpful, feel free to ask me question if you need more details :)

NOTE : re-used Alex's answer to edit my answer. Also check this matplotlib documentation for more details. In this specific case point 3 is no more relevant.

Share:
25,494
Max2603
Author by

Max2603

Updated on August 11, 2020

Comments

  • Max2603
    Max2603 almost 4 years

    I have a pandas dataframe like the following:

    df = pd.DataFrame({ 'a_wood' : np.random.randn(100),
                     'a_grassland' : np.random.randn(100),
                     'a_settlement' : np.random.randn(100),
                     'b_wood' : np.random.randn(100),
                     'b_grassland' : np.random.randn(100),
                      'b_settlement' : np.random.randn(100)})
    

    and I want to create histograms of this data with every dataframe header in one subplot.

    fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')
    
    m=0
    for i in range(2):
        for j in range(3):
    
            df.hist(column = df.columns[m], bins = 12, ax=ax[i,j], figsize=(20, 18))
            m+=1
    

    For that the previous code works perfectly but now I want to combine eyery a and b header (e.g. "a_woods" and "b-woods") to one subplot so there would be just three histograms. I tried assigning two columns to df.columns[[m,m+3]] but this doesn't work. I also have an index column with strings like "day_1", which I want to be on the x-axis. Can someone help me?

    This is how far i got. Histogram