R - predict command error "undefined columns selected"

13,359

predict.boosting() expects to be given the actual labels for the test data, so it can calculate how well it did (as in the confusion matrix shown below).

library(adabag) 

data(iris)

iris.adaboost <- boosting(Species~Sepal.Length+Sepal.Width+Petal.Length+
      Petal.Width, data=iris, boos=TRUE, mfinal=10)

# make a 'test' dataframe without the classes, as in the question
iris2 <- iris
iris2$Species <- NULL

# replicates the error
irispred=predict.boosting(iris.adaboost, newdata=iris2)
#Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) : 
#  undefined columns selected

Here's working example, drawn largely from the help file just so there is a working example here (and to demonstrate the confusion matrix).

# first create subsets of iris data for training and testing  
sub <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))
iris3 <- iris[sub,]
iris4 <- iris[-sub,]

iris.adaboost <- boosting(Species ~ ., data=iris3, mfinal=10)

# works
iris.predboosting<- predict.boosting(iris.adaboost, newdata=iris4)

iris.predboosting$confusion
#               Observed Class
#Predicted Class setosa versicolor virginica
#     setosa         50          0         0
#     versicolor      0         50         0
#     virginica       0          0        50
Share:
13,359
Admin
Author by

Admin

Updated on June 11, 2022

Comments

  • Admin
    Admin almost 2 years

    I’m a newbie to R, and I’m having trouble with an R predict command. I receive this error

     Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) : 
      undefined columns selected
    

    when I execute this command:

    model.predict <- predict.boosting(model,newdata=test)
    

    Here is my model:

    model <- boosting(Y~x1+x2+x3+x4+x5+x6+x7, data=train)
    

    And here is the structure of my test data: str(test)

    'data.frame':   343 obs. of  7 variables:
     $ x1: Factor w/ 4 levels "Americas","Asia_Pac",..: 4 2 4 2 4 3 3 3 4 1 ...
     $ x2: Factor w/ 5 levels "Fifth","First",..: 3 3 2 2 4 2 4 4 1 1 ...
     $ x3: Factor w/ 3 levels "Best","Better",..: 2 3 1 1 3 2 2 1 3 3 ...
     $ x4: Factor w/ 2 levels "Female","Male": 1 1 2 1 1 2 1 2 2 2 ...
     $ x5: int  82 55 47 31 6 53 77 68 76 86 ...
     $ x6: num  22.8 14.6 25.5 38.3 7.9 32.8 4.6 34.2 36.7 21.7 ...
     $ x7: num  0.679 0.925 0.897 0.684 0.195 ...
    

    And the structure of my training data:

    $ RecordID: int  1 2 3 4 5 6 7 8 9 10 ...
     $ x1      : Factor w/ 4 levels "Americas","Asia_Pac",..: 1 2 2 3 1 1 1 2 2 4 ...
     $ x2      : Factor w/ 5 levels "Fifth","First",..: 5 5 3 2 5 5 5 4 3 2 ...
     $ x3      : Factor w/ 3 levels "Best","Better",..: 2 3 2 2 3 1 2 3 1 1 ...
     $ x4      : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 2 1 1 ...
     $ x5      : int  1 67 75 51 84 33 21 80 48 5 ...
     $ x6      : num  21 13.8 30.3 11.9 1.7 13.2 33.9 17 3.4 19.5 ...
     $ x7      : num  0.35 0.85 0.73 0.39 0.47 0.13 0.2 0.12 0.64 0.11 ...
     $ Y       : Factor w/ 2 levels "Green","Yellow": 2 2 1 2 2 2 1 2 2 2 ..
    

    I think there’s a problem with the structure of the test data, but I can’t find it, or I have a mis-understanding as to the structure of the “predict” command. Note that if I run the predict command on the training data, it works. Any suggestions as to where to look?

    Thanks!