R package caret confusionMatrix with missing categories
Solution 1
You can use union
to ensure similar levels:
library(caret)
# Sample Data
predicted <- c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference <- c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4
u <- union(predicted, reference)
t <- table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)
Solution 2
First note that confusionMatrix
can be called as confusionMatrix(predicted, actual)
in addition to being called with table
objects. However, the function throws an error if predicted
and actual
(both regarded as factor
s) do not have the same number of levels.
This (and the fact that the caret
package spit an error on me because they don't get the dependencies right in the first place) is why I'd suggest to create your own function:
# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
# You've mentioned that neither actual nor predicted may give a complete
# picture of the available classes, hence:
numClasses <- max(act, pred)
# Sort predicted and actual as it simplifies what's next. You can make this
# faster by storing `order(act)` in a temporary variable.
pred <- pred[order(act)]
act <- act[order(act)]
sapply(split(pred, act), tabulate, nbins=numClasses)
}
# Generate random data since you've not provided an actual example.
actual <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)
print( createConfusionMatrix(actual, predicted) )
which will give you:
1 2 3 4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,] 0 0 0 0
[4,] 89 77 82 83
Barker
Data scientist with PhD from Stanford University with experience in machine learning for biomedical informatics, image processing and analysis, and time series forecasting.
Updated on May 24, 2020Comments
-
Barker almost 4 years
I am using the function
confusionMatrix
in the R packagecaret
to calculate some statistics for some data I have. I have been putting my predictions as well as my actual values into thetable
function to get the table to be used in theconfusionMatrix
function as so:table(predicted,actual)
However, there are multiple possible outcomes (e.g. A, B, C, D), and my predictions do not always represent all the possibilities (e.g. only A, B, D). The resulting output of the
table
function does not include the missing outcome and looks like this:A B C D A n1 n2 n2 n4 B n5 n6 n7 n8 D n9 n10 n11 n12 # Note how there is no corresponding row for `C`.
The
confusionMatrix
function can't handle the missing outcome and gives the error:Error in !all.equal(nrow(data), ncol(data)) : invalid argument type
Is there a way I can use the
table
function differently to get the missing rows with zeros or use theconfusionMatrix
function differently so it will view missing outcomes as zero?As a note: Since I am randomly selecting my data to test with, there are times that a category is also not represented in the actual result as opposed to just the predicted. I don't believe this will change the solution.
-
topepo over 10 yearsThis is correct. You should have the same levels in the observed and predicted so that the full table is shown. How else would
table
know that other factor levels are possible? Most of the functions incaret
go to great lengths to ensure predictions always have the same levels as the original classes - Max -
fotNelton over 10 years
table
could tell from the union of the actual and predicted levels. I can however accept (what else could I do anyway :-)) thattable
works this way, I just thought I should mention the fact that for this particular problem it won't work as the OP wishes. -
Barker over 10 yearsThank you so much, this was a huge help!. I added some code to create the matrix if 'act' does not have all the possible values represented as well as to be able to label the rows and columns and it worked perfectly. Also as a note to others, you need to use 'as.table' function to make it work in the 'confusionMatrix' function.
-
Joonhwan over 8 yearsFor me, because my factor levels are not continuous, fotNelton's method was not applicable. But this works, thanks.
-
Mohammad Mazraeh over 7 yearsGood and Simple! I would Edit this two lines in the function:
numClasses <- length(unique(c(act,pred)))
sapply(split(as.factor(pred), as.factor(act)), tabulate, nbins=numClasses)