How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?

23,250

When you use cross_val_score method, you can specify, which scorings you can calculate on each fold:

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

scoring = {'accuracy' : make_scorer(accuracy_score), 
           'precision' : make_scorer(precision_score),
           'recall' : make_scorer(recall_score), 
           'f1_score' : make_scorer(f1_score)}

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50) 

results = model_selection.cross_val_score(estimator=model,
                                          X=features,
                                          y=labels,
                                          cv=kfold,
                                          scoring=scoring)

After cross validation, you will get results dictionary with keys: 'accuracy', 'precision', 'recall', 'f1_score', which store metrics values on each fold for certain metric. For each metric you can calculate mean and std value by using np.mean(results[value]) and np.std(results[value]), where value - one of your specified metric name.

Share:
23,250
Jayashree
Author by

Jayashree

Updated on November 15, 2021

Comments

  • Jayashree
    Jayashree over 2 years

    I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds.

    kfold = model_selection.KFold(n_splits=10, random_state=42)
    model=RandomForestClassifier(n_estimators=50) 
    

    I got the results of the 10 folds

    results = model_selection.cross_val_score(model,features,labels, cv=kfold)
    print results
    [ 0.60666667  0.60333333  0.52333333  0.73        0.75333333  0.72        0.7
      0.73        0.83666667  0.88666667]
    

    I have calculated accuracy by taking mean and standard deviation of the results

    print("Accuracy: %.3f%% (%.3f%%)") % (results.mean()*100.0, results.std()*100.0)
    Accuracy: 70.900% (10.345%)
    

    I have computed my predictions as follows

    predictions = cross_val_predict(model, features,labels ,cv=10)
    

    Since this is an imbalanced dataset,I would like to calculate precision,recall and f1 score of each fold and average the results. How to calculate the values in python?

  • Jayashree
    Jayashree over 6 years
    How to calculate training and testing error for each fold?
  • Eduard Ilyasov
    Eduard Ilyasov over 6 years
    cross_val_score calculates metrics values on validation data only. But you can make two custom iterators. First iterator will yields to you train objects positional indices and instead of validation positional indices yields same train objects positional indices of your features DataFrame. Second iterator will yields to you train objects positional indices same as in first iterator, but instead of val positional indices yields remaining object's positional indices of your features DataFrame.
  • Eduard Ilyasov
    Eduard Ilyasov over 6 years
    After cross_val_score with custom first cv you'l get metrics values on train set and after cross_val_score with custom second cv you'l get metrics values on validation set.
  • ankurrc
    ankurrc over 6 years
    For version 0.19, it should be model_selection.cross_validate and not model_selection.cross_val_score.