How to solve "The data cannot have more levels than the reference" error when using confusioMatrix?

16,698

Solution 1

I had the same issue in classification. It turns out that there is ZERO observation in a specific group therefore I got the error "the data cannot have more levels than the reference”.

Make sure there all groups in your test set appears in your training set.

Solution 2

If you look carefully at your plots, you will see that you are training a regression tree and not a classication tree.

If you run credit$Creditability <- as.factor(credit$Creditability) after reading in the data and use type = "class" in the predict function, your code should work.

code:

credit <- read.csv("http://freakonometrics.free.fr/german_credit.csv" )

credit$Creditability <- as.factor(credit$Creditability)

library(caret)
library(tree)
library(e1071)

set.seed(1000)
intrain <- createDataPartition(y = credit$Creditability, p = 0.7, list = FALSE)
train <- credit[intrain, ]
test <- credit[-intrain, ]

treemod <- tree(Creditability ~ ., data = train, )

cv.trees <- cv.tree(treemod, FUN = prune.tree)
plot(cv.trees)

prune.trees <- prune.tree(treemod, best = 3)
plot(prune.trees)
text(prune.trees, pretty = 0)

treepred <- predict(prune.trees, newdata = test, type = "class")
confusionMatrix(treepred, test$Creditability)
Share:
16,698
Admin
Author by

Admin

Updated on June 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm using R programming. I divided the data as train & test for predicting accuracy.

    This is my code:

    library("tree")
    credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv")
    
    library("caret")
    set.seed(1000)
    
    intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE)
    train<-credit[intrain, ]
    test<-credit[-intrain, ]
    
    treemod<-tree(Creditability~. , data=train)
    plot(treemod)
    text(treemod)
    
    cv.trees<-cv.tree(treemod,FUN=prune.tree)
    plot(cv.trees)
    
    prune.trees<-prune.tree(treemod,best=3)
    plot(prune.trees)
    text(prune.trees,pretty=0)
    
    install.packages("e1071")
    library("e1071")
    treepred<-predict(prune.trees, newdata=test)
    
    confusionMatrix(treepred, test$Creditability)
    

    The following error message happens in confusionMatrix:

    Error in confusionMatrix.default(rpartpred, test$Creditability) : the data cannot have more levels than the reference

    The credit data can download at this site.
    http://freakonometrics.free.fr/german_credit.csv

  • StatMan
    StatMan over 7 years
    More or less, the code then predicts the probabilities whether each entry test belong to class '0' and '1', so OP have to convert these predicted probabilities to predicted classifications.