Concatenate sparse matrices in Python using SciPy/Numpy

35,510

You can use the scipy.sparse.hstack to concatenate sparse matrices with the same number of rows (horizontal concatenation):

from scipy.sparse import hstack
hstack((X, X2))

Similarly, you can use scipy.sparse.vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).

Using numpy.hstack or numpy.vstack will create an array with two sparse matrix objects.

Share:
35,510

Related videos on Youtube

PascalVKooten
Author by

PascalVKooten

Software enthusiast. Blockchain, Cognitive apps, machine learning & AI; to name a few. Here's my LinkedIn and here is my GitHub.

Updated on July 09, 2022

Comments

  • PascalVKooten
    PascalVKooten almost 2 years

    What would be the most efficient way to concatenate sparse matrices in Python using SciPy/Numpy?

    Here I used the following:

    >>> np.hstack((X, X2))
    array([ <49998x70000 sparse matrix of type '<class 'numpy.float64'>'
            with 1135520 stored elements in Compressed Sparse Row format>,
            <49998x70000 sparse matrix of type '<class 'numpy.int64'>'
            with 1135520 stored elements in Compressed Sparse Row format>], 
           dtype=object)
    

    I would like to use both predictors in a regression, but the current format is obviously not what I'm looking for. Would it be possible to get the following:

        <49998x1400000 sparse matrix of type '<class 'numpy.float64'>'
         with 2271040 stored elements in Compressed Sparse Row format>
    

    It is too large to be converted to a deep format.

  • simeon
    simeon almost 7 years
    Seems hstack is quite slow, check this post out on a similar question link
  • Saullo G. P. Castro
    Saullo G. P. Castro almost 7 years
    @simeon interesting that Scipy's dev team hasn't adopted such efficient solution
  • mgokhanbakal
    mgokhanbakal over 3 years
    For the horizontal concatenation hstack() and for the vertical concatenation vstack() can be used.