What are ways to speed up seaborns pairplot

15,291

Solution 1

Rather than parallelizing, you could downsample your DataFrame to say, 1000 rows to get a quick peek, if the speed bottleneck is indeed occurring there. 1000 points is enough to get a general idea of what's going on, usually.

i.e. sns.pairplot(df.sample(1000)).

Solution 2

Save your pairplot to image and then show this image instead of rendering it all in your browser.

from IPython.display import Image
import seaborn as sns
import matplotlib.pyplot as plt 

sns_plot = sns.pairplot(df, size=2.0)
sns_plot.savefig("pairplot.png")

plt.clf() # Clean parirplot figure from sns 
Image(filename='pairplot.png') # Show pairplot as image

Solution 3

For me, I had a situation where the histograms were taking a very long time due to the variance in the data. I only had 1200 rows and 4 columns, but it took half an hour before I gave up. I think it was so spread out and unordered that the histogram was constantly updating. One workaround might be to play with the bin parameter, but my solution was to use a KDE for the diagonal instead. With the KDE, it takes only a few seconds.

sns.pairplot(df, diag_kind='kde')
Share:
15,291

Related videos on Youtube

Quickbeam2k1
Author by

Quickbeam2k1

Updated on September 14, 2022

Comments

  • Quickbeam2k1
    Quickbeam2k1 over 1 year

    I have a dataframe with 250.000 rows but 140 columns and I'm trying to construct a pair plot. of the variables. I know the number of subplots is huge, as well as the time it takes to do the plots. (I'm waiting for more than an hour on an i5 with 3,4 GHZ and 32 GB RAM).

    Remebering that scikit learn allows to construct random forests in parallel, I was checking if this was possible also with seaborn. However, I didn't find anything. The source code seems to call the matplotlib plot function for every single image.

    Couldn't this be parallelised? If yes, what is a good way to start from here?

  • Quickbeam2k1
    Quickbeam2k1 over 4 years
    This ist also an important idea. But creating the Image was the bigger bottleneck at that Time I believe.