Variable importance using the caret package (error); RandomForest algorithm


Solution 1

The importance scores can take a while to compute and train won't automatically get randomForest to create them. Add importance = TRUE to the train call and it should work.


Solution 2

That is becouse the obtained from train() object is not a pure Random Forest model, but a list of different objects (containing the final model itself as well as cross-validation results etc). You may see them with ls(model2). So to use the final model just call varImp(model2$finalModel) .

    I am trying to obtain the variable importance of a rf model in any way. This is the approach I have tried so far, but alternate suggestions are very welcome.

    I have trained a model in R:

    myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
    model2 = train(increaseInAssessedLevel~., data=trainData, method = 'rf', trControl=myControl)

    The dataset is fairly large, but the model runs fine. I can access its parts and run commands such as:

    > model2[3]
      mtry      RMSE  Rsquared      RMSESD RsquaredSD
    1    2 0.1901304 0.3342449 0.004586902 0.05089500
    2   61 0.1080164 0.6984240 0.006195397 0.04428158
    3  120 0.1084201 0.6954841 0.007119253 0.04362755

    But I get the following error:

    > varImp(model2)
    Error in varImp[, "%IncMSE"] : subscript out of bounds

    Apparently there is supposed to be a wrapper, but that does not seem to be the case: (cf:

    Error: could not find function "varImp.randomForest"

    But this is particularly odd:

    > traceback()
    No traceback available 
