How to convert sparse matrix to dense form using python
Solution 1
List comprehension is the easiest way:
new_list = [[b for _,b in sub] for sub in mx]
Result:
>>> new_list
[[2, 1, 1, 1, 1, 3, 4, 2, 5, 1], [1, 5, 2, 1, 1, 1, 1, 1, 1, 2], [2, 1, 1, 1, 2, 1, 1, 1, 1, 1]]
Solution 2
Here's a pretty hacky way to do what you're asking for :
dense = [[int(''.join(str(val) for _, val in doc))] for doc in mx]
Basically it converts each value from the nested tuples into a string and concatenates all of those strings together, then converts that back to an integer. Repeat for each element of mx
.
Solution 3
Your source data do not really match any of the built-in formats supported by sparse matrices in SciPy (see http://docs.scipy.org/doc/scipy/reference/sparse.html and http://en.wikipedia.org/wiki/Sparse_matrix), so using .todense()
will not really be productive here. In particular, if you have something like:
import numpy as np
my_sparseish_matrix = np.array([[(1, 2), (3, 4)]])
then my_sparseish_matrix
will already be a dense numpy array ! Calling .todense()
on it at that point will produce an error, and doesn't make sense anyway.
So my recommendation is to construct your dense array explicitly using a couple of for
loops. To do this you'll need to know how many items are possible in your resulting vector -- call it N
.
dense_vector = np.zeros((N, ), int)
for inner in mx:
for index, value in inner:
dense_vector[index] = value
Tiger1
Updated on June 04, 2022Comments
-
Tiger1 almost 2 years
I have the following matrix which I believe is sparse. I tried converting to dense using the x.dense format but it never worked. Any suggestions as to how to do this?, thanks.
mx=[[(0, 2), (1, 1), (2, 1), (3, 1), (4, 1), (5, 3), (6, 4), (7, 2), (8, 5), (9, 1)], [(10, 1), (11, 5), (12, 2), (13, 1), (21, 1), (22, 1), (23, 1), (24, 1), (25, 1), (26, 2)], [(27, 2), (28, 1), (29, 1), (30, 1), (31, 2), (32, 1), (33, 1), (34, 1), (35, 1), (36, 1)]]
someone put forward the solution below, but is there a better way?
def assign_coo_to_dense(sparse, dense): dense[sparse.row, sparse.col] = sparse.data
mx.todense(). Intended output should appear in this form:[[2,1,1,1,1,3,4], [1,5,2,1,1,1,1], [2,1,1,1,2,1,1,1]]
-
Floris over 10 yearsAre you using numpy or scipy?
-
Tiger1 over 10 yearsHi Floris, I'm using numpy, but it seems most people have addressed similar problems using scipy.
-
Saullo G. P. Castro over 10 years@Tiger1 is
mx
a matrix containing indices or values? In SciPy you will need a maximum dimension of 2 for the sparse matrix, which does not seem to be your case... -
Tiger1 over 10 yearsHi Saullo, indices follow by values.
-
Akavall over 10 yearsYou need to use
x.todense()
, notx.dense()
. -
Tiger1 over 10 yearsHi Akavall, I actually made used of x.todense() and got the following error message: AttributeError: 'list' object has no attribute 'todense'
-
Floris over 10 yearsdid you declare
mx
to be a numpy array? -
Tiger1 over 10 years@Floris, i actually forgot to declare it numpy array. I will try it now. Thanks.
-
lmjohns3 over 10 yearsIt sounds like the data structure you listed is of the form
[[(index, value), ...], ...]
-- that is, a list of lists, each containing a series of index, value pairs. But since there is only one index associated with each value, this makes me think your data is really a vector. Does the ordering of the lists indicate anything, perhaps the row structure of the matrix ? Or can we ignore the list-of-lists part of the structure ? -
Tiger1 over 10 years@LeifJohnson, smart observation. The data is a vector, to be more specific, it represents word frequencies, and in general mx is a list of lists.
-
Tiger1 over 10 years@Floris, i got the same error message after declaring mx as numpy:AttributeError: 'list' object has no attribute 'todense. My goal is for the output to appear in this dense form: :[[2111134], [1521111], [21112111]]
-
Akavall over 10 yearsAre you sure the output you want is :
[[2111134], [1521111], [21112111]]
, not[[2,1,1,1,1,3,4], [1,5,2,1,1,1,1], [2,1,1,1,2,1,1,1]]
? The later seems much more useful. -
Tiger1 over 10 yearsThanks Akavall, I forgot to put comma, and that explains why my code isn't working.
-
-
Tiger1 over 10 yearsThanks @Imjohns3, how can I know the value of N when the actual data set contains thousands of documents (up to a million items)? Here is a code that does that, and also maintains the order of items in the list:q=[] for doc in corpus_tfidf: j=([i[1] for i in doc]) q.append(j)
-
lmjohns3 over 10 yearsOh wow, that's totally different than what I thought you were asking ! It would be helpful to specify this in your question.
-
Tiger1 over 10 yearsHi Imjohns3, thanks for the solution. it worked but each item is supposed to be a list; values separated by comma.See question for update. Thanks
-
Floris over 10 yearsFinally an answer that ignores the whole "what kind of data is this" red herring and gets to the "here is how you get from the input you have to the output you want".
-
Tiger1 over 10 years@AKavall, thanks for the solution. Its exactly what I was looking for.