confusionMatrix for logistic regression in R
I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix
from the caret
package to compute and display confusion matrices, but you don't need to table your results before that call.
Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.
train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
Now, you can predict the data (for example, your training set) and then use confusionMatrix()
that takes two arguments:
- your predictions
- the observed classes
library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set
pdata <- predict(logitMod, newdata = train, type = "response")
# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)
Here are the results
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 66 33
1 0 1
Accuracy : 0.67
95% CI : (0.5688, 0.7608)
No Information Rate : 0.66
P-Value [Acc > NIR] : 0.4625
Pumpkin C
Updated on September 04, 2020Comments
-
Pumpkin C almost 4 years
I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
i set the threshold of predicted probability at 0.5:
confusionMatrix(table(predict(logitMod, type="response") >= 0.5, train$LoanStatus_B == 1))
And the the code below works well for my training set. However, when i use the test set:
confusionMatrix(table(predict(logitMod, type="response") >= 0.5, test$LoanStatus_B == 1))
it gave me an error of
Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length
Why is this? How can I fix this? Thank you!
-
user20650 almost 7 yearsyou need to pass the test dataset to the predict function, otherwise it will make predictions on the train dataset. ie
predict(logitMod, newdata=test, type="response")
-
Pumpkin C almost 7 yearsThx it works!..
-
-
Pumpkin C almost 7 yearsWhat is this line doing data = as.numeric(pdata>0.5)
-
Damiano Fantini almost 7 yearsYour target variable is either 0 or 1, but the prediction returns a value in the range 0 to 1. Therefore you need to convert it to binary (discretization). For example, you test if a value is bigger or smaller than 0.5. TRUE is then converted to 1 (and FALSE to 0) using as.nmeric
-
Pumpkin C almost 7 yearsSo it is the threshold, right? I can change it into any 0-1 number i want
-
Pumpkin C almost 7 yearsThe last line in the result is "'Positive' Class : 0 ", but in my case i want positive class:1, which is default, can i do that?
-
Damiano Fantini almost 7 years0.5 is the threshold. You are supposed to use the number that best fits your data. 0.5 is a pretty consistent number to start from.
-
Damiano Fantini almost 7 yearsSure you can do. The function has an argument for that. Please, check
?confusionMatrix()
. For example:confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B, positive = "1")
-
Pumpkin C almost 7 yearsOkay, but here the 1 is a numeric instead of string right?
-
Damiano Fantini almost 7 yearsIn this case, "1" corresponds to your numeric 1s. However, the positive argument is provided as a character! If you care about accuracy, it doesn't matter. But it is important for computing sensitivity/specificity, cause you need to know which are true/false positives. Fore example, try:
confusionMatrix(data = as.factor(c("A","B", "B", "B", "A", "A", "A", "A", "B", "B")), reference = as.factor(c("A","A", "A", "B", "A", "A", "A", "A", "B", "A")), positive = "A")
and the same line withpositive = "B"
. I hope this was useful. If so, please, validate my answer. Thanks