confusion matrix of bstTree predictions, Error: 'The data must contain some levels that overlap the reference.'
Solution 1
max(pred_bstTree) [1] 1.03385
min(pred_bstTree) [1] 1.011738
and errors tells it all. Plotting ROC is simply checking the effect of different threshold points. Based on threshold rounding happens e.g. 0.7 will be converted to 1 (TRUE class) and 0.3 will be go 0 (FALSE class); in case threshold is 0.5. Threshold values are in range of (0,1)
In your case regardless of threshold you will always get all observations into TRUE class as even minimum prediction is greater than 1. (Thats why @phiver was wondering if you are doing regression instead of classification) . Without any zero in prediction there is no level in 'prediction' which coincide with zero level in adverse_effects
and hence this error.
PS: It will be difficult to tell root cause of error without you posting your data
Solution 2
I had similar problem, which refers to this error. I used function confusionMatrix
:
confusionMatrix(actual, predicted, cutoff = 0.5)
An I got the following error: Error in confusionMatrix.default(actual, predicted, cutoff = 0.5) : The data must contain some levels that overlap the reference.
I checked couple of things like:
class(actual)
-> numeric
class(predicted)
-> integer
unique(actual)
-> plenty values, since it is probability
unique(predicted)
-> 2 levels: 0 and 1
I concluded, that there is problem with applying cutoff part of the function, so I did it before by:
predicted<-ifelse(predicted> 0.5,1,0)
and run the confusionMatrix
function, which works now just fine:
cm<- confusionMatrix(actual, predicted)
cm$table
which generated correct outcome.
One takeaway for your case, which might improve interpretation once you make code working: you mixed input values for your confusion matrix(as per confusionMatrix package documetation), instead of:
conf_bstTree= confusionMatrix(pred_bstTree,testSplit$adverse_effects)
you should have written:
conf_bstTree= confusionMatrix(testSplit$adverse_effects,pred_bstTree)
As said it will most likely help you interpret confusion matrix, once you figure out way to make it work.
Hope it helps.
SaikiHanee
Updated on July 18, 2022Comments
-
SaikiHanee almost 2 years
I am trying to train a model using bstTree method and print out the confusion matrix. adverse_effects is my class attribute.
set.seed(1234) splitIndex <- createDataPartition(attended_num_new_bstTree$adverse_effects, p = .80, list = FALSE, times = 1) trainSplit <- attended_num_new_bstTree[ splitIndex,] testSplit <- attended_num_new_bstTree[-splitIndex,] ctrl <- trainControl(method = "cv", number = 5) model_bstTree <- train(adverse_effects ~ ., data = trainSplit, method = "bstTree", trControl = ctrl) predictors <- names(trainSplit)[names(trainSplit) != 'adverse_effects'] pred_bstTree <- predict(model_bstTree$finalModel, testSplit[,predictors]) plot.roc(auc_bstTree) conf_bstTree= confusionMatrix(pred_bstTree,testSplit$adverse_effects)
But I get the error 'Error in confusionMatrix.default(pred_bstTree, testSplit$adverse_effects) : The data must contain some levels that overlap the reference.'
max(pred_bstTree) [1] 1.03385 min(pred_bstTree) [1] 1.011738 > unique(trainSplit$adverse_effects) [1] 0 1 Levels: 0 1
How can I fix this issue?
> head(trainSplit) type New_missed Therapytypename New_Diesease gender adverse_effects change_in_exposure other_reasons other_medication 5 2 1 14 13 2 0 0 0 0 7 2 0 14 13 2 0 0 0 0 8 2 0 14 13 2 0 0 0 0 9 2 0 14 13 2 1 0 0 0 11 2 1 14 13 2 0 0 0 0 12 2 0 14 13 2 0 0 0 0 uvb_puva_type missed_prev_dose skintypeA skintypeB Age DoseB DoseA 5 5 1 1 1 22 3.000 0 7 5 0 1 1 22 4.320 0 8 5 0 1 1 22 4.752 0 9 5 0 1 1 22 5.000 0 11 5 1 1 1 22 5.000 0 12 5 0 1 1 22 5.000 0
-
phiver over 7 yearsLooks like you are predicting regression not classification. Check if adverse_effects is set as a factor in your data.
-
SaikiHanee over 7 yearsYes, it is a factor phiver containing 0 and 1. Even when i predict after converting to numeric i get the same error
-
phiver over 7 yearsTry adding a sample of your data. It is difficult to see where the problem is.
-
-
SaikiHanee over 7 yearsabhiieor, the data set contains nearly 40000 records but 88% of the data belongs to class 0 and the rest belongs to class 1.
-
abhiieor over 7 yearsData you have given is too little to replicate. I hope while making
adverse_effects
factor you have done eithermodel_bstTree <- train(as.factor(adverse_effects) ~ ., data = trainSplit, method = "bstTree", trControl = ctrl)
or elseattended_num_new_bstTree$adverse_effects <- as.factor(attended_num_new_bstTree$adverse_effects)
. if yes then I would suggest you to try any other classification method say logistic regression, random forest, GBM etc. to see if you see same behavior. Ideally you will not get same behavior.