Is there a numpy.delete() equivalent for sparse matrices?

18,082

Solution 1

For CSR, this is probably the most efficient way to do it in-place:

def delete_row_csr(mat, i):
    if not isinstance(mat, scipy.sparse.csr_matrix):
        raise ValueError("works only for CSR format -- use .tocsr() first")
    n = mat.indptr[i+1] - mat.indptr[i]
    if n > 0:
        mat.data[mat.indptr[i]:-n] = mat.data[mat.indptr[i+1]:]
        mat.data = mat.data[:-n]
        mat.indices[mat.indptr[i]:-n] = mat.indices[mat.indptr[i+1]:]
        mat.indices = mat.indices[:-n]
    mat.indptr[i:-1] = mat.indptr[i+1:]
    mat.indptr[i:] -= n
    mat.indptr = mat.indptr[:-1]
    mat._shape = (mat._shape[0]-1, mat._shape[1])

In LIL format it's even simpler:

def delete_row_lil(mat, i):
    if not isinstance(mat, scipy.sparse.lil_matrix):
        raise ValueError("works only for LIL format -- use .tolil() first")
    mat.rows = np.delete(mat.rows, i)
    mat.data = np.delete(mat.data, i)
    mat._shape = (mat._shape[0] - 1, mat._shape[1])

Solution 2

Pv.s answer is a good and solid in-place solution that takes

a = scipy.sparse.csr_matrix((100,100), dtype=numpy.int8)
%timeit delete_row_csr(a.copy(), 0)
10000 loops, best of 3: 80.3 us per loop

for any array size. Since boolean indexing works for sparse matrices, at least in scipy >= 0.14.0, I would suggest to use it whenever multiple rows are to be removed:

def delete_rows_csr(mat, indices):
    """
    Remove the rows denoted by ``indices`` form the CSR sparse matrix ``mat``.
    """
    if not isinstance(mat, scipy.sparse.csr_matrix):
        raise ValueError("works only for CSR format -- use .tocsr() first")
    indices = list(indices)
    mask = numpy.ones(mat.shape[0], dtype=bool)
    mask[indices] = False
    return mat[mask]

This solution takes significantly longer for a single row removal

%timeit delete_rows_csr(a.copy(), [50])
1000 loops, best of 3: 509 us per loop

But is more efficient for the removal of multiple rows, as the execution time barely increases with the number of rows

%timeit delete_rows_csr(a.copy(), numpy.random.randint(0, 100, 30))
1000 loops, best of 3: 523 us per loop

Solution 3

In addition to @loli's version of @pv's answer, I expanded their function to allow for row and/or column deletion by index on CSR matrices.

import numpy as np
from scipy.sparse import csr_matrix

def delete_from_csr(mat, row_indices=[], col_indices=[]):
    """
    Remove the rows (denoted by ``row_indices``) and columns (denoted by ``col_indices``) from the CSR sparse matrix ``mat``.
    WARNING: Indices of altered axes are reset in the returned matrix
    """
    if not isinstance(mat, csr_matrix):
        raise ValueError("works only for CSR format -- use .tocsr() first")

    rows = []
    cols = []
    if row_indices:
        rows = list(row_indices)
    if col_indices:
        cols = list(col_indices)

    if len(rows) > 0 and len(cols) > 0:
        row_mask = np.ones(mat.shape[0], dtype=bool)
        row_mask[rows] = False
        col_mask = np.ones(mat.shape[1], dtype=bool)
        col_mask[cols] = False
        return mat[row_mask][:,col_mask]
    elif len(rows) > 0:
        mask = np.ones(mat.shape[0], dtype=bool)
        mask[rows] = False
        return mat[mask]
    elif len(cols) > 0:
        mask = np.ones(mat.shape[1], dtype=bool)
        mask[cols] = False
        return mat[:,mask]
    else:
        return mat

Solution 4

You can delete row 0 < i < X.shape[0] - 1 from a CSR matrix X with

scipy.sparse.vstack([X[:i, :], X[i:, :]])

You can delete the first or the last row with X[1:, :] or X[:-1, :], respectively. Deleting multiple rows in one gone will probably require rolling your own function.

For other formats than CSR, this might not necessarily work as not all formats support row slicing.

Solution 5

To remove the i'th row from A simply use left matrix multiplication:

B = J*A

where J is a sparse identity matrix with i'th row removed.
Left multiplication by the transpose of J will insert a zero-vector back to the i'th row of B, which makes this solution a bit more general.

A0 = J.T * B

To construct J itself, I used pv.'s solution on a sparse diagonal matrix as follows (maybe there's a simpler solution for this special case?)

def identity_minus_rows(N, rows):
    if np.isscalar(rows):
        rows = [rows]
    J = sps.diags(np.ones(N), 0).tocsr()  # make a diag matrix
    for r in sorted(rows):
        J = delete_row_csr(J, r)
    return J

You may also remove columns by right-multiplying by J.T of the appropriate size.
Finally, multiplication is efficient in this case because J is so sparse.

Share:
18,082

Related videos on Youtube

pemistahl
Author by

pemistahl

Computational linguist and software engineer, currently interested in Kotlin and Rust programming

Updated on September 15, 2022

Comments

  • pemistahl
    pemistahl over 1 year

    Let's say I have a 2-dimensional matrix as a numpy array. If I want to delete rows with specific indices in this matrix, I use numpy.delete(). Here is an example of what I mean:

    In [1]: my_matrix = numpy.array([
       ...:     [10, 20, 30, 40, 50],
       ...:     [15, 25, 35, 45, 55],
       ...:     [95, 96, 97, 98, 99]
       ...: ])
    In [2]: numpy.delete(my_matrix, [0, 2], axis=0)
    Out[2]: array([[15, 25, 35, 45, 55]])
    

    I'm looking for a way to do the above with matrices from the scipy.sparse package. I know it's possible to do this by converting the entire matrix into a numpy array but I don't want to do that. Is there any other way of doing that?

    Thanks a lot!

  • pemistahl
    pemistahl over 11 years
    Oh dear, this is laborious and seems to be rather complicated for deleting multiple rows. If this is the only way of doing it, it's probably better for my purposes to convert the matrix into a numpy array, even though it's inefficient.
  • Alejandro Sazo
    Alejandro Sazo over 7 years
    Probably he wanted to delete instead of logical erase by multiplying (i.e. making a row/column zero).
  • hpaulj
    hpaulj almost 6 years
    Row (or column) indexing, A[index,:], uses this kind of multiplication. It creates an extractor matrix like this, and multiplies. Sparse matrix multiplication is a (relative) strong point. Sparse matrix slicing using list of int
  • AlexConfused
    AlexConfused over 5 years
    Shouldn't it be scipy.sparse.vstack([X[:i, :], X[i+1:, :]])
  • CamilB
    CamilB about 5 years
    Excuse the ignorance, but what to you mean by "WARNING: Indices of altered axes are reset in the returned matrix"?
  • Philip
    Philip about 5 years
    @CamilB The original indices will not match the indices of the returned/new matrix, because this function creates a new one and fills it with the values that should be kept. Example: if you have 4 rows and want the 3rd row out, the row indices would not be 0, 1 and 3, but 0, 1 and 2 (the 4th row index is reset to follow the previous number).
  • CamilB
    CamilB about 5 years
    I thought it meant something else. Thank you.
  • Brent Baccala
    Brent Baccala over 4 years
    Based on my experience with a dozen matrices of approximate size 100,000 x 250,000 with less than 1% density and deleting about 20,000 rows from each, this is by far the fastest algorithm.
  • MarcoMag
    MarcoMag over 3 years
    I use this approach in my finite element code to remove the DoFs (degrees of freedom) in my global sparse stiffness matrix K. The neat advantage of this method for this usage is IMO the fact that at the beginning of a time increment, one computes the J matrix (see also this) and can then use it for the multiple iterations by writing Ksplit=J.dot(K).dot(J.transpose())