How to get a classifier's confidence score for a prediction in sklearn?

40,020

Solution 1

Per the SVC documentation, it looks like you need to change how you construct the SVC:

model = SVC(probability=True)

and then use the predict_proba method:

class_probabilities = model.predict_proba(sub_main)

Solution 2

For those estimators implementing predict_proba() method, like Justin Peel suggested, You can just use predict_proba() to produce probability on your prediction.

For those estimators which do not implement predict_proba() method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).

Let me know if you need any detailed examples to demonstrate either of these two cases.

Share:
40,020
user3377126
Author by

user3377126

Updated on April 23, 2020

Comments

  • user3377126
    user3377126 about 4 years

    I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.

    I want something like this:

    How sure is the classifier on its prediction?

    Class 1: 81% that this is class 1
    Class 2: 10%
    Class 3: 6%
    Class 4: 3%

    Samples of my code:

    features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)
    
    # Determine amount of time to train
    t0 = time()
    model = SVC()
    #model = SVC(kernel='poly')
    #model = GaussianNB()
    
    model.fit(features_train, labels_train)
    
    print 'training time: ', round(time()-t0, 3), 's'
    
    # Determine amount of time to predict
    t1 = time()
    pred = model.predict(features_test)
    
    print 'predicting time: ', round(time()-t1, 3), 's'
    
    accuracy = accuracy_score(labels_test, pred)
    
    print 'Confusion Matrix: '
    print confusion_matrix(labels_test, pred)
    
    # Accuracy in the 0.9333, 9.6667, 1.0 range
    print accuracy
    
    
    
    model.predict(sub_main)
    
    # Determine amount of time to predict
    t1 = time()
    pred = model.predict(sub_main)
    
    print 'predicting time: ', round(time()-t1, 3), 's'
    
    print ''
    print 'Prediction: '
    print pred
    

    I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?