Dot product sparse matrices


A scipy sparse matrix is modeled on the numpy matrix subclass, and as such implements * as matrix multiplication. a.multiply is element by element muliplication, such as that used by np.array *.

I'd suggest making a couple of small matrices, and try the various forms of multiplication, including what you think is the equivalent. It will be easier to tell what's going on with something small.

a = np.arange(12).reshape(3,4)
a1 = sparse.csr_matrix(a), a.T)
a1 * a.T

Just for reference, is this what you want (using dense arrays):

In [7]: a=np.arange(12).reshape(3,4)

In [8]: [[i],a[i]) for i in range(3)]
Out[8]: [14, 126, 366]

In [9]: np.einsum('ij,ij->i',a,a)
Out[9]: array([ 14, 126, 366])

and the sparse

In [11]: a1=sparse.csr_matrix(a)

The full matrix or dot product is more that what you want, right? You just want the diagonal.

In [15]: (a1*a1.T).A
array([[ 14,  38,  62],
       [ 38, 126, 214],
       [ 62, 214, 366]], dtype=int32)

In [16]:
array([[ 14,  38,  62],
       [ 38, 126, 214],
       [ 62, 214, 366]])

In [21]: (a1*a1.T).diagonal()
Out[21]: array([ 14, 126, 366], dtype=int32)

For something that is quite sparse taking the full matrix multiplication followed by diagonal might be as fast as any alternative. Iterating over the rows of a sparse matrix is a relatively slow operation, while the matrix multiplication has been implemented in fast c code.

Another way - element multiplication followed by sum.

In [22]: np.sum(a*a,axis=1)
Out[22]: array([ 14, 126, 366])

In [23]: a1.multiply(a1).sum(axis=1)
matrix([[ 14],
        [366]], dtype=int32)

sparse implements sum as a matrix multiplication (by a column of ones).

In [26]: a1.multiply(a1)*np.array([1,1,1,1])[:,None]
array([[ 14],
       [366]], dtype=int32)
Author by


Updated on June 15, 2022


  • David
    David almost 2 years

    I have two sparse matrices (a and b) in python of the following dimensions:

    a = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
        with 1127853 stored elements in Compressed Sparse Row format>


    b = <240760x2177930 sparse matrix of type '<class 'numpy.float64'>'
        with 439309 stored elements in Compressed Sparse Row format>

    Question: I'd like to get a column vector of length 240760 that is the row-wise dot product of the two matrices. For example, dot(a[0],b[0]) would be the first element of my output vector. dot(a[1],b[1]) would be the second, and so forth.

    Is there a vectorized easy way to accomplish this?

    EDIT: One way to accomplish this would be to convert each row into a dense vector, flatten it out, and use Something like:[0]).flatten(),np.array(b[0]).flatten()).  

    But this requires iterating row wise and convert each row into a dense vector, which is very time consuming. I'm thinking there's probably an easier way to do this...

  • keshr3106
    keshr3106 about 6 years
    Such a great answer. Thank you so much for the different options!