How to calculate percentage of sparsity for a numpy array/matrix?

python arrays numpy matrix sparse-matrix

13,418

Solution 1

np.isnan(a).sum()

gives the number of nan values, in this example 8.

np.prod(a.shape)

is the number of values, here 50. Their ratio should give the desired value.

In [1081]: np.isnan(a).sum()/np.prod(a.shape)
Out[1081]: 0.16

You might also find it useful to make a masked array from this

In [1085]: a_ma=np.ma.masked_invalid(a)
In [1086]: print(a_ma)
[[0.0 0.0 0.0 0.0 1.0]
 [1.0 1.0 0.0 -- --]
 [0.0 -- 1.0 -- --]
 [1.0 1.0 1.0 1.0 0.0]
 [0.0 0.0 0.0 1.0 0.0]
 [0.0 0.0 0.0 0.0 --]
 [-- -- 1.0 1.0 1.0]
 [0.0 1.0 0.0 1.0 0.0]
 [1.0 0.0 1.0 0.0 0.0]
 [0.0 1.0 0.0 0.0 0.0]]

The number of valid values then is:

In [1089]: a_ma.compressed().shape
Out[1089]: (42,)

Solution 2

Definition:

Code for a general case:

from numpy import array
from numpy import count_nonzero
import numpy as np

# create dense matrix
A = array([[1, 1, 0, 1, 0, 0], [1, 0, 2, 0, 0, 1], [99, 0, 0, 2, 0, 0]])

#If you have Nan
A = np.nan_to_num(A,0)

print(A)
#[[ 1  1  0  1  0  0]
# [ 1  0  2  0  0  1]
# [99  0  0  2  0  0]]

# calculate sparsity
sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )
print(sparsity)

Results:

0.555555555556

Solution 3

Measuring the percentage of missing values has already explained by 'hpaulj'.

I am taking the first part of your question, Assuming array has Zero's and Non-Zero's...

Sparsity refers to Zero values and density refers to Non-Zero values in array. Suppose your array is X, get count of non-zero values:

non_zero = np.count_nonzero(X)

total values in X:

total_val = np.product(X.shape)

Sparsity will be -

sparsity = (total_val - non_zero) / total_val

And Density will be -

density = non_zero / total_val

The sum of Sparsity and Density must equal to 100%...

13,418

Author by

ShanZhengYang

Updated on June 14, 2022

Comments

ShanZhengYang almost 2 years

I have the following 10 by 5 numpy array/matrix, which has a number of NaN values:

array([[  0.,   0.,   0.,   0.,   1.],
       [  1.,   1.,   0.,  nan,  nan],
       [  0.,  nan,   1.,  nan,  nan],
       [  1.,   1.,   1.,   1.,   0.],
       [  0.,   0.,   0.,   1.,   0.],
       [  0.,   0.,   0.,   0.,  nan],
       [ nan,  nan,   1.,   1.,   1.],
       [  0.,   1.,   0.,   1.,   0.],
       [  1.,   0.,   1.,   0.,   0.],
       [  0.,   1.,   0.,   0.,   0.]])

How does one measure exactly how sparse this array is? Is there a simply function in numpy for measuring the percentage of missing values?

ShanZhengYang over 5 years

Thanks for this! This is helpful
Mohit Pandey over 4 years

If A_sparse is a sparse matrix, then correct expression is sparsity = 1.0 - ( A_sparse.count_nonzero() / float(A_sparse.toarray().size) ). Using float(A_sparse.size) would give incorrect sparsity of 0 for all sparse matrices.
Mohit Pandey over 4 years

Actually float(A.toarray().size) and float(A.size) is not same if A is a sparse matrix. This is so because size for a sparse matrix gives the number of entries corresponding to non-zero elements. Also, np.prod(A_sparse.shape) is better than using A_sparse.toarray().size because the later one involves an computationally expensive step of converting a sparse matrix to dense martix.