How to run SVC classifier after running 10-fold cross validation in sklearn?

python machine-learning svm cross-validation

14,225

Solution 1

You're almost there:

# Build your classifier
classifier = svm.SVC()

# Train it on the entire training data set
classifier.fit(X_train, y_train)

# Get predictions on the test set
y_pred = classifier.predict(X_test)

At this point, you can use any metric from the sklearn.metrics module to determine how well you did. For example:

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

Solution 2

You only need to split your X and y. Do not split the train and test.

Then you can pass your classifier in your case svm to the cross_val_score function to get the accuracy for each experiment.

In just 3 lines of code:

clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, X, y, cv=10)
print scores

14,225

Author by

M_13

Updated on December 03, 2022

Comments

M_13 over 1 year
I'm relatively new to machine learning and would like some help in the following:

I ran a Support Vector Machine Classifier (SVC) on my data with 10-fold cross validation and calculated the accuracy score (which was around 89%). I'm using Python and scikit-learn to perform the task. Here's a code snippet:
```
def get_scores(features,target,classifier):
  X_train, X_test, y_train, y_test =train_test_split(features, target , 
    test_size=0.3)
    scores = cross_val_score(
    classifier,
    X_train,
    y_train,
    cv=10,
    scoring='accuracy',
    n_jobs=-1)
 return(scores)

get_scores(features_from_df,target_from_df,svm.SVC())
```
Now, how can I use my classifier (after running the 10-folds cv) to test it on X_test and compare the predicted results to y_test? As you may have noticed, I only used X_train and y_train in the cross validation process.

I noticed that sklearn have cross_val_predict: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html should I replace my cross_val_score by cross_val_predict? just FYI: my target data column is binarized (have values of 0s and 1s).

If my approach is wrong, please advise me with the best way to proceed with.

Thanks!
M_13 over 6 years

Thank you for answer. This, however, doesn't take cross-validation into consideration. Any alternative suggestions would be great.
Vivek Kumar over 6 years

@M_13 No model will ever take cross-validation into account. CV is just to check the performance of the model on your data. Please read about cross-validation
mrazizi over 4 years

And don't forget: from sklearn.model_selection import cross_val_score
Peshmerge over 2 years

I wonder how this answer is accpeted as an 'answer'. It doesn't address the main point which is cross-validation