How to convert sparse matrix to dense form using python

12,051

Solution 1

List comprehension is the easiest way:

new_list = [[b for _,b in sub] for sub in mx]

Result:

>>> new_list
[[2, 1, 1, 1, 1, 3, 4, 2, 5, 1], [1, 5, 2, 1, 1, 1, 1, 1, 1, 2], [2, 1, 1, 1, 2, 1, 1, 1, 1, 1]]

Solution 2

Here's a pretty hacky way to do what you're asking for :

dense = [[int(''.join(str(val) for _, val in doc))] for doc in mx]

Basically it converts each value from the nested tuples into a string and concatenates all of those strings together, then converts that back to an integer. Repeat for each element of mx.

Solution 3

Your source data do not really match any of the built-in formats supported by sparse matrices in SciPy (see http://docs.scipy.org/doc/scipy/reference/sparse.html and http://en.wikipedia.org/wiki/Sparse_matrix), so using .todense() will not really be productive here. In particular, if you have something like:

import numpy as np

my_sparseish_matrix = np.array([[(1, 2), (3, 4)]])

then my_sparseish_matrix will already be a dense numpy array ! Calling .todense() on it at that point will produce an error, and doesn't make sense anyway.

So my recommendation is to construct your dense array explicitly using a couple of for loops. To do this you'll need to know how many items are possible in your resulting vector -- call it N.

dense_vector = np.zeros((N, ), int)
for inner in mx:
    for index, value in inner:
        dense_vector[index] = value
Share:
12,051
Tiger1
Author by

Tiger1

Updated on June 04, 2022

Comments

  • Tiger1
    Tiger1 almost 2 years

    I have the following matrix which I believe is sparse. I tried converting to dense using the x.dense format but it never worked. Any suggestions as to how to do this?, thanks.

    mx=[[(0, 2), (1, 1), (2, 1), (3, 1), (4, 1), (5, 3), (6, 4), (7, 2), (8, 5), (9, 1)], 
    [(10, 1), (11, 5), (12, 2), (13, 1), (21, 1), (22, 1), (23, 1), (24, 1), (25, 1), (26, 2)], 
    [(27, 2), (28, 1), (29, 1), (30, 1), (31, 2), (32, 1), (33, 1), (34, 1), (35, 1), (36, 1)]]
    

    someone put forward the solution below, but is there a better way?

    def assign_coo_to_dense(sparse, dense):
        dense[sparse.row, sparse.col] = sparse.data
    

    mx.todense(). Intended output should appear in this form:[[2,1,1,1,1,3,4], [1,5,2,1,1,1,1], [2,1,1,1,2,1,1,1]]

    • Floris
      Floris over 10 years
      Are you using numpy or scipy?
    • Tiger1
      Tiger1 over 10 years
      Hi Floris, I'm using numpy, but it seems most people have addressed similar problems using scipy.
    • Saullo G. P. Castro
      Saullo G. P. Castro over 10 years
      @Tiger1 is mx a matrix containing indices or values? In SciPy you will need a maximum dimension of 2 for the sparse matrix, which does not seem to be your case...
    • Tiger1
      Tiger1 over 10 years
      Hi Saullo, indices follow by values.
    • Akavall
      Akavall over 10 years
      You need to use x.todense(), not x.dense().
    • Tiger1
      Tiger1 over 10 years
      Hi Akavall, I actually made used of x.todense() and got the following error message: AttributeError: 'list' object has no attribute 'todense'
    • Floris
      Floris over 10 years
      did you declare mx to be a numpy array?
    • Tiger1
      Tiger1 over 10 years
      @Floris, i actually forgot to declare it numpy array. I will try it now. Thanks.
    • lmjohns3
      lmjohns3 over 10 years
      It sounds like the data structure you listed is of the form [[(index, value), ...], ...] -- that is, a list of lists, each containing a series of index, value pairs. But since there is only one index associated with each value, this makes me think your data is really a vector. Does the ordering of the lists indicate anything, perhaps the row structure of the matrix ? Or can we ignore the list-of-lists part of the structure ?
    • Tiger1
      Tiger1 over 10 years
      @LeifJohnson, smart observation. The data is a vector, to be more specific, it represents word frequencies, and in general mx is a list of lists.
    • Tiger1
      Tiger1 over 10 years
      @Floris, i got the same error message after declaring mx as numpy:AttributeError: 'list' object has no attribute 'todense. My goal is for the output to appear in this dense form: :[[2111134], [1521111], [21112111]]
    • Akavall
      Akavall over 10 years
      Are you sure the output you want is : [[2111134], [1521111], [21112111]], not [[2,1,1,1,1,3,4], [1,5,2,1,1,1,1], [2,1,1,1,2,1,1,1]] ? The later seems much more useful.
    • Tiger1
      Tiger1 over 10 years
      Thanks Akavall, I forgot to put comma, and that explains why my code isn't working.
  • Tiger1
    Tiger1 over 10 years
    Thanks @Imjohns3, how can I know the value of N when the actual data set contains thousands of documents (up to a million items)? Here is a code that does that, and also maintains the order of items in the list:q=[] for doc in corpus_tfidf: j=([i[1] for i in doc]) q.append(j)
  • lmjohns3
    lmjohns3 over 10 years
    Oh wow, that's totally different than what I thought you were asking ! It would be helpful to specify this in your question.
  • Tiger1
    Tiger1 over 10 years
    Hi Imjohns3, thanks for the solution. it worked but each item is supposed to be a list; values separated by comma.See question for update. Thanks
  • Floris
    Floris over 10 years
    Finally an answer that ignores the whole "what kind of data is this" red herring and gets to the "here is how you get from the input you have to the output you want".
  • Tiger1
    Tiger1 over 10 years
    @AKavall, thanks for the solution. Its exactly what I was looking for.