SVC (support vector classification) with categorical (string) data as labels
Solution 1
Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.
In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.
Solution 2
you can try this code:
from sklearn import svm
X = [[0, 0], [1, 1],[2,3]]
y = ['A', 'B','C']
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X, y)
clf.predict([[2,3]])
output: array(['C'], dtype='|S1')
You should take the dependent variable (y) as 'list'.
beta
Updated on June 17, 2022Comments
-
beta almost 2 years
I use
scikit-learn
to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).I try to fit the model:
clf = svm.SVC(gamma=0.001, C=100.) clf.fit(features_training,labels_training)
But at the second line, I get an error:
ValueError: could not convert string to float: 'A'
The error is expected because
label_training
contains string values which represent three different categories, such asA
,B
,C
.So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance,
A = 0
,B = 1
, etc. But is this really the best solution? -
Martin Thoma almost 8 yearsYou should at least link directly to the section and mention the OneHotEncoder
-
gtzinos over 6 yearsBut how could hotencoding help you when you will try to predict a new color ? Maybe in your case you have to retrain the model. Do you have any solution ?