Concatenate sparse matrices in Python using SciPy/Numpy
You can use the scipy.sparse.hstack
to concatenate sparse matrices with the same number of rows (horizontal concatenation):
from scipy.sparse import hstack
hstack((X, X2))
Similarly, you can use scipy.sparse.vstack
to concatenate sparse matrices with the same number of columns (vertical concatenation).
Using numpy.hstack
or numpy.vstack
will create an array with two sparse matrix objects.
Related videos on Youtube
PascalVKooten
Software enthusiast. Blockchain, Cognitive apps, machine learning & AI; to name a few. Here's my LinkedIn and here is my GitHub.
Updated on July 09, 2022Comments
-
PascalVKooten almost 2 years
What would be the most efficient way to concatenate sparse matrices in Python using SciPy/Numpy?
Here I used the following:
>>> np.hstack((X, X2)) array([ <49998x70000 sparse matrix of type '<class 'numpy.float64'>' with 1135520 stored elements in Compressed Sparse Row format>, <49998x70000 sparse matrix of type '<class 'numpy.int64'>' with 1135520 stored elements in Compressed Sparse Row format>], dtype=object)
I would like to use both predictors in a regression, but the current format is obviously not what I'm looking for. Would it be possible to get the following:
<49998x1400000 sparse matrix of type '<class 'numpy.float64'>' with 2271040 stored elements in Compressed Sparse Row format>
It is too large to be converted to a deep format.
-
simeon almost 7 yearsSeems hstack is quite slow, check this post out on a similar question link
-
Saullo G. P. Castro almost 7 years@simeon interesting that Scipy's dev team hasn't adopted such efficient solution
-
mgokhanbakal over 3 yearsFor the horizontal concatenation hstack() and for the vertical concatenation vstack() can be used.