confusionMatrix for logistic regression in R

r validation logistic-regression confusion-matrix

39,076

I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix from the caret package to compute and display confusion matrices, but you don't need to table your results before that call.

Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

Now, you can predict the data (for example, your training set) and then use confusionMatrix() that takes two arguments:

your predictions
the observed classes

library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set    
pdata <- predict(logitMod, newdata = train, type = "response")

# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)

Here are the results

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 66 33
         1  0  1

               Accuracy : 0.67            
                 95% CI : (0.5688, 0.7608)
    No Information Rate : 0.66            
    P-Value [Acc > NIR] : 0.4625

39,076

Author by

Pumpkin C

Updated on September 04, 2020

Comments

Pumpkin C almost 4 years
I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:
```
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
```
i set the threshold of predicted probability at 0.5:
```
confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      train$LoanStatus_B == 1))
```
And the the code below works well for my training set. However, when i use the test set:
```
confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      test$LoanStatus_B == 1))
```
it gave me an error of
```
Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length
```
Why is this? How can I fix this? Thank you!
- user20650 almost 7 years
  
  you need to pass the test dataset to the predict function, otherwise it will make predictions on the train dataset. ie predict(logitMod, newdata=test, type="response")
- Pumpkin C almost 7 years
  
  Thx it works!..
Pumpkin C almost 7 years

What is this line doing data = as.numeric(pdata>0.5)
Damiano Fantini almost 7 years

Your target variable is either 0 or 1, but the prediction returns a value in the range 0 to 1. Therefore you need to convert it to binary (discretization). For example, you test if a value is bigger or smaller than 0.5. TRUE is then converted to 1 (and FALSE to 0) using as.nmeric
Pumpkin C almost 7 years

So it is the threshold, right? I can change it into any 0-1 number i want
Pumpkin C almost 7 years

The last line in the result is "'Positive' Class : 0 ", but in my case i want positive class:1, which is default, can i do that?
Damiano Fantini almost 7 years

0.5 is the threshold. You are supposed to use the number that best fits your data. 0.5 is a pretty consistent number to start from.
Damiano Fantini almost 7 years

Sure you can do. The function has an argument for that. Please, check ?confusionMatrix(). For example: confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B, positive = "1")
Pumpkin C almost 7 years

Okay, but here the 1 is a numeric instead of string right?
Damiano Fantini almost 7 years

In this case, "1" corresponds to your numeric 1s. However, the positive argument is provided as a character! If you care about accuracy, it doesn't matter. But it is important for computing sensitivity/specificity, cause you need to know which are true/false positives. Fore example, try: confusionMatrix(data = as.factor(c("A","B", "B", "B", "A", "A", "A", "A", "B", "B")), reference = as.factor(c("A","A", "A", "B", "A", "A", "A", "A", "B", "A")), positive = "A") and the same line with positive = "B". I hope this was useful. If so, please, validate my answer. Thanks