R convert matrix or data frame to sparseMatrix

41,193

Solution 1

Here are two options:

library(Matrix)

A <- as(regMat, "sparseMatrix")       # see also `vignette("Intro2Matrix")`
B <- Matrix(regMat, sparse = TRUE)    # Thanks to Aaron for pointing this out

identical(A, B)
# [1] TRUE
A
# 10 x 10 sparse Matrix of class "dgCMatrix"
#                              
#  [1,] . . .  .  . 45 .  . . .
#  [2,] . . .  .  .  . . 59 . .
#  [3,] . . .  . 95  . .  . . .
#  [4,] . . .  .  .  . .  . . .
#  [5,] . . .  .  .  . .  . . .
#  [6,] . . .  .  .  . .  . . .
#  [7,] . . . 23  .  . .  . . .
#  [8,] . . . 63  .  . .  . . .
#  [9,] . . .  .  .  . .  . . .
# [10,] . . .  .  .  . .  . . .

Solution 2

Josh's answer is fine, but here are more options and explanation.

Nit Picky "I have a regular matrix (non-sparse)..." Actually you do have a sparse matrix (matrix with mostly 0s); it's just in uncompressed format. Your goal is to put it in a compressed storage format.

Sparse matrices can be compressed into multiple storage formats. Compressed Sparse Column (CSC) and Compressed Sparse Row (CSR) are the two dominant formats. as(regMat, "sparseMatrix") converts your matrix to type dgCMatrix which is compressed sparse column. This is usually what you want, but I prefer to be explicit about it.

library(Matrix)

matCSC <- as(regMat, "dgCMatrix")  # compressed sparse column CSC
matCSC
10 x 10 sparse Matrix of class "dgCMatrix"

 [1,] . . .  .  . 57 .  . . .
 [2,] . . .  .  .  . . 27 . .
 [3,] . . .  . 90  . .  . . .
 [4,] . . .  .  .  . .  . . .
 [5,] . . .  .  .  . .  . . .
 [6,] . . .  .  .  . .  . . .
 [7,] . . . 91  .  . .  . . .
 [8,] . . . 37  .  . .  . . .
 [9,] . . .  .  .  . .  . . .
[10,] . . .  .  .  . .  . . .

matCSR <- as(regMat, "dgRMatrix")  # compressed sparse row CSR
matCSR
10 x 10 sparse Matrix of class "dgRMatrix"

 [1,] . . .  .  . 57 .  . . .
 [2,] . . .  .  .  . . 27 . .
 [3,] . . .  . 90  . .  . . .
 [4,] . . .  .  .  . .  . . .
 [5,] . . .  .  .  . .  . . .
 [6,] . . .  .  .  . .  . . .
 [7,] . . . 91  .  . .  . . .
 [8,] . . . 37  .  . .  . . .
 [9,] . . .  .  .  . .  . . .
[10,] . . .  .  .  . .  . . .

While these look and behave the same on the surface, internally they store data differently. CSC is faster for retrieving columns of data while CSR is faster for retrieving rows. They also take up different amounts of space depending on the structure of your data.

Furthermore, in this example you're converting an uncompressed sparse matrix to a compressed one. Usually you do this to save memory, so building an uncompressed matrix just to convert it to compressed form defeats the purpose. In practice it's more common to construct a compressed sparse matrix from a table of (row, column, value) triplets. You can do this with Matrix's sparseMatrix() function.

# Make data.frame of (row, column, value) triplets
df <- data.frame(
  rowIdx = c(3,2,8,1,7),
  colIdx = c(5,8,4,6,4),
  val = round(runif(n = 5), 2) * 100
)

df
  rowIdx colIdx val
1      3      5  90
2      2      8  27
3      8      4  37
4      1      6  57
5      7      4  91

# Build CSC matrix
matSparse <- sparseMatrix(
  i = df$rowIdx,
  j = df$colIdx, 
  x = df$val, 
  dims = c(10, 10)
)

matSparse
10 x 10 sparse Matrix of class "dgCMatrix"

 [1,] . . .  .  . 57 .  . . .
 [2,] . . .  .  .  . . 27 . .
 [3,] . . .  . 90  . .  . . .
 [4,] . . .  .  .  . .  . . .
 [5,] . . .  .  .  . .  . . .
 [6,] . . .  .  .  . .  . . .
 [7,] . . . 91  .  . .  . . .
 [8,] . . . 37  .  . .  . . .
 [9,] . . .  .  .  . .  . . .
[10,] . . .  .  .  . .  . . .

Shameless Plug - I have blog article covering this stuff and more if you're interested.

Share:
41,193
screechOwl
Author by

screechOwl

https://financenerd.blog/blog/

Updated on June 26, 2020

Comments

  • screechOwl
    screechOwl almost 4 years

    I have a regular matrix (non-sparse) that I would like to convert to a sparseMatrix (using the Matrix package). Is there a function to do this or do I need to do a bunch of loops?

    ex.

    > regMat <- matrix(0, nrow=10, ncol=10)
    > regMat[3,5] <- round(runif(1),2)*100
    > regMat[2,8] <- round(runif(1),2)*100
    > regMat[8,4] <- round(runif(1),2)*100
    > regMat[1,6] <- round(runif(1),2)*100
    > regMat[7,4] <- round(runif(1),2)*100
    > regMat 
          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
     [1,]    0    0    0    0    0   49    0    0    0     0
     [2,]    0    0    0    0    0    0    0   93    0     0
     [3,]    0    0    0    0   20    0    0    0    0     0
     [4,]    0    0    0    0    0    0    0    0    0     0
     [5,]    0    0    0    0    0    0    0    0    0     0
     [6,]    0    0    0    0    0    0    0    0    0     0
     [7,]    0    0    0    8    0    0    0    0    0     0
     [8,]    0    0    0   14    0    0    0    0    0     0
     [9,]    0    0    0    0    0    0    0    0    0     0
    [10,]    0    0    0    0    0    0    0    0    0     0
    

    Any suggestions?

  • Aaron left Stack Overflow
    Aaron left Stack Overflow about 12 years
    also Matrix(regMat, sparse=TRUE)
  • Josh O'Brien
    Josh O'Brien about 12 years
    Thanks @Aaron. I didn't know about that idiom, but have added it to the answer as a second option.
  • Tapper
    Tapper over 4 years
    @JoshO'Brien would you know if there is an option to directly convert from a file? Because a dense matrices might be too big to read completely into memory. Thanks!
  • Tapper
    Tapper over 4 years
    As a partial answer to my question: There are faster ways to read a file.