How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?
Solution 1
I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.
You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.
>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0., 0., 0.],
[ 0., 0., -3.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 6., 0.]])
Hope that helps!
Solution 2
I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.
HTH
Solution 3
Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:
import scipy.sparse as ssp
def pointmult(a,b):
x = a.copy()
for i in xrange(a.shape[0]):
if x.data[i]:
for j in xrange(len(x.data[i])):
x.data[i] *= b[x.rows[i]]
return x
It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.
Related videos on Youtube
ogrisel
Python / Java / Clojure / C datageek with a taste for artificial intelligence, machine learning, cloud computing, OpenCL, NLP, the semantic web and braaaaains!
Updated on December 22, 2020Comments
-
ogrisel over 3 years
Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:
>>> import numpy as np >>> import scipy.sparse as ssp >>> a = ssp.lil_matrix((5, 3)) >>> a[1, 2] = -1 >>> a[4, 1] = 2 >>> a.todense() matrix([[ 0., 0., 0.], [ 0., 0., -1.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 2., 0.]])
Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):
>>> d = np.ones(3) * 3 >>> d array([ 3., 3., 3.])
I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:
>>> a * d array([ 0., -3., 0., 0., 6.])
One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:
>>> a.toarray() * d array([[ 0., 0., 0.], [ 0., 0., -3.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 6., 0.]])
But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):
>>> ssp.issparse(a.toarray()) False
Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?
-
mtrw almost 14 yearsIf
d
is a sparse matrix of the same size asa
you can usea.multiply(d)
. Perhaps you can make ad
that's N rows long and loop over N rows ofa
at a time? -
ogrisel almost 14 yearsBut d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop.
-
mtrw almost 14 yearsI guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately.
-
ogrisel almost 14 yearsActually there is
a.multply(d)
that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug.
-
-
ogrisel almost 14 yearsthanks I would have liked to avoid for loops in python however. But maybe there is no way out with the current scipy.sparse classes for this use case.
-
Fred Foo over 11 yearsThe great thing about this is that it also works when
X
is anndarray
or a dense matrix. +1. -
ali_m over 8 yearsThis could be further simplified using
scipy.sparse.diags(d, 0)
rather thanlil_matrix
-
markhor over 7 years@K3---rnc the result is dense only if B is dense. If you convert B to any of the sparse formats, it will do the trick. E.g. A.multiply(csc_matrix(B))