How to compute ROC and AUC under ROC after training using caret in R?


Solution 1

A sample example for AUC:

rf_output=randomForest(x=predictor_data, y=target, importance = TRUE, ntree = 10001, proximity=TRUE, sampsize=sampsizes)


perf_AUC=performance(pred,"auc") #Calculate the AUC value
[email protected][[1]]

perf_ROC=performance(pred,"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

or using pROC and caret


iris <- iris[iris$Species == "virginica" | iris$Species == "versicolor", ]
iris$Species <- factor(iris$Species)  # setosa should be removed from factor

samples <- sample(NROW(iris), NROW(iris) * .5)
data.train <- iris[samples, ]
data.test <- iris[-samples, ]
forest.model <- train(Species ~., data.train)

result.predicted.prob <- predict(forest.model, data.test, type="prob") # Prediction

result.roc <- roc(data.test$Species, result.predicted.prob$versicolor) # Draw ROC curve.
plot(result.roc, print.thres="best","closest.topleft")

result.coords <- coords(result.roc, "best", best.method="closest.topleft", ret=c("threshold", "accuracy"))
print(result.coords)#to get threshold and accuracy

Solution 2

Update 2019. This is what MLeval was written for (, it works with the Caret train output object to make ROCs, PR curves, calibration curves, and calculate metrics, such as ROC-AUC, sensitivity, specificity etc. It just uses one line to do all of this which is helpful for my analyses and may be of interest.


myTrainingControl <- trainControl(method = "cv", 
                                  number = 10, 
                                  savePredictions = TRUE, 
                                  classProbs = TRUE, 
                                  verboseIter = TRUE)

randomForestFit = train(x = Sonar[,1:60], 
                        y = as.factor(Sonar$Class), 
                        method = "rf", 
                        trControl = myTrainingControl, 
                        preProcess = c("center","scale"), 
                        ntree = 50)


x <- evalm(randomForestFit)

## get roc curve plotted in ggplot2


## get AUC and other metrics

    I have used caret package's train function with 10-fold cross validation. I also have got class probabilities for predicted classes by setting classProbs = TRUE in trControl, as follows:

    myTrainingControl <- trainControl(method = "cv", 
                                  number = 10, 
                                  savePredictions = TRUE, 
                                  classProbs = TRUE, 
                                  verboseIter = TRUE)
    randomForestFit = train(x = input[3:154], 
                            y = as.factor(input$Target), 
                            method = "rf", 
                            trControl = myTrainingControl, 
                            preProcess = c("center","scale"), 
                            ntree = 50)

    The output predictions I am getting is as follows.

      pred obs    0    1 rowIndex mtry Resample
    1    0   1 0.52 0.48       28   12   Fold01
    2    0   0 0.58 0.42       43   12   Fold01
    3    0   1 0.58 0.42       51   12   Fold01
    4    0   0 0.68 0.32       55   12   Fold01
    5    0   0 0.62 0.38       59   12   Fold01
    6    0   1 0.92 0.08       71   12   Fold01

    Now I want to calculate ROC and AUC under ROC using this data. How would I achieve this?