Matrix multiplication in R: requires numeric/complex matrix/vector arguments

15,373

Organizing our long-winded discussion in comments to an answer.

Matrix-multiplication operators / functions like "%*%",crossprod,tcrossprod` expects matrices with "numeric", "complex" or "logical" mode. However, your matrix has "character" mode.

library(mlbench)
data(BreastCancer)
X <- as.matrix(BreastCancer[, 1:10])
mode(X)
#[1] "character"

You might be surprised as the dataset seems to hold numeric data:

head(BreastCancer[, 1:10])
#       Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
#1 1000025            5         1          1             1            2
#2 1002945            5         4          4             5            7
#3 1015425            3         1          1             1            2
#4 1016277            6         8          8             1            3
#5 1017023            4         1          1             3            2
#6 1017122            8        10         10             8            7
#  Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses
#1           1           3               1       1
#2          10           3               2       1
#3           2           3               1       1
#4           4           3               7       1
#5           1           3               1       1
#6          10           9               7       1

But you are misinformed by the printing style. These columns are in fact characters or factors:

lapply(BreastCancer[, 1:10], class)
#$Id
#[1] "character"
#
#$Cl.thickness
#[1] "ordered" "factor" 
#
#$Cell.size
#[1] "ordered" "factor" 
#
#$Cell.shape
#[1] "ordered" "factor" 
#
#$Marg.adhesion
#[1] "ordered" "factor" 
#
#$Epith.c.size
#[1] "ordered" "factor" 
#
#$Bare.nuclei
#[1] "factor"
#
#$Bl.cromatin
#[1] "factor"
#
#$Normal.nucleoli
#[1] "factor"
#
#$Mitoses
#[1] "factor"

When you do as.matrix, these columns are all coerced to "character" (see R: Why am I not getting type or class "factor" after converting columns to factor? for a thorough explanation).

So to do the matrix-multiplication, we need to correctly coerce these columns to "numeric".


dat <- BreastCancer[, 1:10]

## character to numeric
dat[[1]] <- as.numeric(dat[[1]])

## factor to numeric
dat[2:10] <- lapply( dat[2:10], function (x) as.numeric(levels(x))[x] )

## get the matrix
X <- data.matrix(dat)
mode(X)
#[1] "numeric"

Now you can do for example, a matrix-vector multiplication.

## some possible matrix-vector multiplications
beta <- runif(10)
yhat <- X %*% beta

## add prediction back to data frame
dat$prediction <- yhat

However, I doubt this is the correct way to obtain predicted values for you logistic regression model as when you build your model with factors, the model matrix is not the above X but a dummy matrix. I highly recommend you using predict.


This line also worked for me: as.matrix(sapply(dat, as.numeric))

Looks like you were lucky. The dataset happens to have factor levels as same as numeric values. In general, converting a factor to numeric should use the method I did. Compare

f <- gl(4, 2, labels = c(12.3, 0.5, 2.9, -11.1))
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1
#Levels: 12.3 0.5 2.9 -11.1

as.numeric(f)
#[1] 1 1 2 2 3 3 4 4

as.numeric(levels(f))[f]
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1

This is covered at the doc page ?factor.

Share:
15,373
TonyGW
Author by

TonyGW

Too lazy to write anything about myself

Updated on June 06, 2022

Comments

  • TonyGW
    TonyGW almost 2 years

    I'm using the dataset BreastCancer in the mlbench package, and I am trying to do the following matrix multiplication as a part of logistic regression.

    I got the features in the first 10 columns, and create a vector of parameters called theta:

    X <- BreastCancer[, 1:10]
    theta <- data.frame(rep(1, 10))
    

    Then I did the following matrix multiplication:

    constant <- as.matrix(X) %*% as.vector(theta[, 1])
    

    However, I got the following error:

    Error in as.matrix(X) %*% as.vector(theta[, 1]) : 
      requires numeric/complex matrix/vector arguments
    

    Do I need to cast the matrix to double using as.numeric(X) first? Values in X look like strings as they have double quotes.