How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

python numpy scipy sparse-matrix

15,473

Solution 1

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

Solution 2

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

Solution 3

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.

15,473

ogrisel

Python / Java / Clojure / C datageek with a taste for artificial intelligence, machine learning, cloud computing, OpenCL, NLP, the semantic web and braaaaains!

Updated on December 22, 2020

Comments

ogrisel over 3 years
Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:
```
>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])
```
Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):
```
>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])
```
I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:
```
>>> a * d
array([ 0., -3.,  0.,  0.,  6.])
```
One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:
```
>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])
```
But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):
```
>>> ssp.issparse(a.toarray())
False
```
Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?
- mtrw almost 14 years
  
  If d is a sparse matrix of the same size as a you can use a.multiply(d). Perhaps you can make a d that's N rows long and loop over N rows of a at a time?
- ogrisel almost 14 years
  
  But d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop.
- mtrw almost 14 years
  
  I guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately.
- ogrisel almost 14 years
  
  Actually there is a.multply(d) that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug.
ogrisel almost 14 years

thanks I would have liked to avoid for loops in python however. But maybe there is no way out with the current scipy.sparse classes for this use case.
Fred Foo over 11 years

The great thing about this is that it also works when X is an ndarray or a dense matrix. +1.
ali_m over 8 years

This could be further simplified using scipy.sparse.diags(d, 0) rather than lil_matrix
markhor over 7 years

@K3---rnc the result is dense only if B is dense. If you convert B to any of the sparse formats, it will do the trick. E.g. A.multiply(csc_matrix(B))