How to run SVC classifier after running 10-fold cross validation in sklearn?

14,225

Solution 1

You're almost there:

# Build your classifier
classifier = svm.SVC()

# Train it on the entire training data set
classifier.fit(X_train, y_train)

# Get predictions on the test set
y_pred = classifier.predict(X_test)

At this point, you can use any metric from the sklearn.metrics module to determine how well you did. For example:

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

Solution 2

You only need to split your X and y. Do not split the train and test.

Then you can pass your classifier in your case svm to the cross_val_score function to get the accuracy for each experiment.

In just 3 lines of code:

clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, X, y, cv=10)
print scores
Share:
14,225
M_13
Author by

M_13

Updated on December 03, 2022

Comments

  • M_13
    M_13 over 1 year

    I'm relatively new to machine learning and would like some help in the following:

    I ran a Support Vector Machine Classifier (SVC) on my data with 10-fold cross validation and calculated the accuracy score (which was around 89%). I'm using Python and scikit-learn to perform the task. Here's a code snippet:

    def get_scores(features,target,classifier):
      X_train, X_test, y_train, y_test =train_test_split(features, target , 
        test_size=0.3)
        scores = cross_val_score(
        classifier,
        X_train,
        y_train,
        cv=10,
        scoring='accuracy',
        n_jobs=-1)
     return(scores)
    
    get_scores(features_from_df,target_from_df,svm.SVC())
    

    Now, how can I use my classifier (after running the 10-folds cv) to test it on X_test and compare the predicted results to y_test? As you may have noticed, I only used X_train and y_train in the cross validation process.

    I noticed that sklearn have cross_val_predict: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html should I replace my cross_val_score by cross_val_predict? just FYI: my target data column is binarized (have values of 0s and 1s).

    If my approach is wrong, please advise me with the best way to proceed with.

    Thanks!

  • M_13
    M_13 over 6 years
    Thank you for answer. This, however, doesn't take cross-validation into consideration. Any alternative suggestions would be great.
  • Vivek Kumar
    Vivek Kumar over 6 years
    @M_13 No model will ever take cross-validation into account. CV is just to check the performance of the model on your data. Please read about cross-validation
  • mrazizi
    mrazizi over 4 years
    And don't forget: from sklearn.model_selection import cross_val_score
  • Peshmerge
    Peshmerge over 2 years
    I wonder how this answer is accpeted as an 'answer'. It doesn't address the main point which is cross-validation