What are ways to speed up seaborns pairplot
Solution 1
Rather than parallelizing, you could downsample your DataFrame
to say, 1000 rows to get a quick peek, if the speed bottleneck is indeed occurring there. 1000 points is enough to get a general idea of what's going on, usually.
i.e. sns.pairplot(df.sample(1000))
.
Solution 2
Save your pairplot to image and then show this image instead of rendering it all in your browser.
from IPython.display import Image
import seaborn as sns
import matplotlib.pyplot as plt
sns_plot = sns.pairplot(df, size=2.0)
sns_plot.savefig("pairplot.png")
plt.clf() # Clean parirplot figure from sns
Image(filename='pairplot.png') # Show pairplot as image
Solution 3
For me, I had a situation where the histograms were taking a very long time due to the variance in the data. I only had 1200 rows and 4 columns, but it took half an hour before I gave up. I think it was so spread out and unordered that the histogram was constantly updating. One workaround might be to play with the bin parameter, but my solution was to use a KDE for the diagonal instead. With the KDE, it takes only a few seconds.
sns.pairplot(df, diag_kind='kde')
Related videos on Youtube
Quickbeam2k1
Updated on September 14, 2022Comments
-
Quickbeam2k1 over 1 year
I have a dataframe with 250.000 rows but 140 columns and I'm trying to construct a pair plot. of the variables. I know the number of subplots is huge, as well as the time it takes to do the plots. (I'm waiting for more than an hour on an i5 with 3,4 GHZ and 32 GB RAM).
Remebering that scikit learn allows to construct random forests in parallel, I was checking if this was possible also with seaborn. However, I didn't find anything. The source code seems to call the matplotlib plot function for every single image.
Couldn't this be parallelised? If yes, what is a good way to start from here?
-
Quickbeam2k1 over 4 yearsThis ist also an important idea. But creating the Image was the bigger bottleneck at that Time I believe.