How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

15,473

Solution 1

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

Solution 2

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

Solution 3

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.

Share:
15,473

Related videos on Youtube

ogrisel
Author by

ogrisel

Python / Java / Clojure / C datageek with a taste for artificial intelligence, machine learning, cloud computing, OpenCL, NLP, the semantic web and braaaaains!

Updated on December 22, 2020

Comments

  • ogrisel
    ogrisel over 3 years

    Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

    >>> import numpy as np
    >>> import scipy.sparse as ssp
    
    >>> a = ssp.lil_matrix((5, 3))
    >>> a[1, 2] = -1
    >>> a[4, 1] = 2
    >>> a.todense()
    matrix([[ 0.,  0.,  0.],
            [ 0.,  0., -1.],
            [ 0.,  0.,  0.],
            [ 0.,  0.,  0.],
            [ 0.,  2.,  0.]])
    

    Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

    >>> d = np.ones(3) * 3
    >>> d
    array([ 3.,  3.,  3.])
    

    I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

    >>> a * d
    array([ 0., -3.,  0.,  0.,  6.])
    

    One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

    >>> a.toarray() * d
    array([[ 0.,  0.,  0.],
           [ 0.,  0., -3.],
           [ 0.,  0.,  0.],
           [ 0.,  0.,  0.],
           [ 0.,  6.,  0.]])
    

    But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

    >>> ssp.issparse(a.toarray())
    False
    

    Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

    • mtrw
      mtrw almost 14 years
      If d is a sparse matrix of the same size as a you can use a.multiply(d). Perhaps you can make a d that's N rows long and loop over N rows of a at a time?
    • ogrisel
      ogrisel almost 14 years
      But d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop.
    • mtrw
      mtrw almost 14 years
      I guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately.
    • ogrisel
      ogrisel almost 14 years
      Actually there is a.multply(d) that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug.
  • ogrisel
    ogrisel almost 14 years
    thanks I would have liked to avoid for loops in python however. But maybe there is no way out with the current scipy.sparse classes for this use case.
  • Fred Foo
    Fred Foo over 11 years
    The great thing about this is that it also works when X is an ndarray or a dense matrix. +1.
  • ali_m
    ali_m over 8 years
    This could be further simplified using scipy.sparse.diags(d, 0) rather than lil_matrix
  • markhor
    markhor over 7 years
    @K3---rnc the result is dense only if B is dense. If you convert B to any of the sparse formats, it will do the trick. E.g. A.multiply(csc_matrix(B))