Variable importance using the caret package (error); RandomForest algorithm

15,163

Solution 1

The importance scores can take a while to compute and train won't automatically get randomForest to create them. Add importance = TRUE to the train call and it should work.

Max

Solution 2

That is becouse the obtained from train() object is not a pure Random Forest model, but a list of different objects (containing the final model itself as well as cross-validation results etc). You may see them with ls(model2). So to use the final model just call varImp(model2$finalModel) .

Share:
15,163
Jakub Langr
Author by

Jakub Langr

Updated on June 10, 2022

Comments

  • Jakub Langr
    Jakub Langr about 2 years

    I am trying to obtain the variable importance of a rf model in any way. This is the approach I have tried so far, but alternate suggestions are very welcome.

    I have trained a model in R:

    require(caret)
    require(randomForest)
    myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
    model2 = train(increaseInAssessedLevel~., data=trainData, method = 'rf', trControl=myControl)
    

    The dataset is fairly large, but the model runs fine. I can access its parts and run commands such as:

    > model2[3]
    $results
      mtry      RMSE  Rsquared      RMSESD RsquaredSD
    1    2 0.1901304 0.3342449 0.004586902 0.05089500
    2   61 0.1080164 0.6984240 0.006195397 0.04428158
    3  120 0.1084201 0.6954841 0.007119253 0.04362755
    

    But I get the following error:

    > varImp(model2)
    Error in varImp[, "%IncMSE"] : subscript out of bounds
    

    Apparently there is supposed to be a wrapper, but that does not seem to be the case: (cf:http://www.inside-r.org/packages/cran/caret/docs/varImp)

    varImp.randomForest(model2)
    Error: could not find function "varImp.randomForest"
    

    But this is particularly odd:

    > traceback()
    No traceback available 
    
    > sessionInfo()
    R version 3.0.1 (2013-05-16)
    Platform: x86_64-redhat-linux-gnu (64-bit)
    
    locale:
     [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
     [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
     [7] LC_PAPER=C                 LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] parallel  stats     graphics  grDevices utils     datasets  methods  
    [8] base     
    
    other attached packages:
     [1] elasticnet_1.1     lars_1.2           klaR_0.6-9         MASS_7.3-26       
     [5] kernlab_0.9-18     nnet_7.3-6         randomForest_4.6-7 doMC_1.3.0        
     [9] iterators_1.0.6    caret_5.17-7       reshape2_1.2.2     plyr_1.8          
    [13] lattice_0.20-15    foreach_1.4.1      cluster_1.14.4    
    
    loaded via a namespace (and not attached):
    [1] codetools_0.2-8 compiler_3.0.1  grid_3.0.1      stringr_0.6.2  
    [5] tools_3.0.1  
    
  • Paul Lo
    Paul Lo almost 10 years
    This doesn't work for me, I made it work by adding importance = TRUE.