Convert sklearn.svm SVC classifier to Keras implementation

machine-learning keras scikit-learn neural-network svm

15,357

Solution 1

If you are making a classifier, you need squared_hinge and regularizer, to get the complete SVM loss function as can be seen here. So you will also need to break your last layer to add regularization parameter before performing activation, I have added the code here.

These changes should give you the output

from keras.regularizers import l2
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu'))
model.add(Dense(1), kernel_regularizer=l2(0.01))
model.add(activation('softmax'))
model.compile(loss='squared_hinge',
              optimizer='adadelta',
              metrics=['accuracy'])
model.fit(X, Y_labels)

Also hinge is implemented in keras for binary classification, so if you are working on a binary classification model, use the code below.

from keras.regularizers import l2
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu'))
model.add(Dense(1), kernel_regularizer=l2(0.01))
model.add(activation('linear'))
model.compile(loss='hinge',
              optimizer='adadelta',
              metrics=['accuracy'])
model.fit(X, Y_labels)

If you cannot understand the article or have issues with the code, feel free to comment. I had this same issue a while back, and this GitHub thread helped me understand, maybe go through it too, some of the ideas here are directly from here https://github.com/keras-team/keras/issues/2588

Solution 2

If you are using Keras 2.0 then you need to change the following lines of anand v sing's answer.

W_regularizer -> kernel_regularizer

Github link

model.add(Dense(nb_classes, kernel_regularizer=regularizers.l2(0.0001)))
model.add(Activation('linear'))
model.compile(loss='squared_hinge',
                      optimizer='adadelta', metrics=['accuracy'])

Or You can use follow

top_model = bottom_model.output
  top_model = Flatten()(top_model)
  top_model = Dropout(0.5)(top_model)
  top_model = Dense(64, activation='relu')(top_model)
  top_model = Dense(2, kernel_regularizer=l2(0.0001))(top_model)
  top_model = Activation('linear')(top_model)
  
  model = Model(bottom_model.input, top_model)
  model.compile(loss='squared_hinge',
                      optimizer='adadelta', metrics=['accuracy'])

15,357

Author by

none32

Playing with different technologies. Contributing to open-source community intensively, since I believe in approach "if you take something - give something back". Python is my programming language of choice.

Updated on June 03, 2022

Comments

none32 almost 2 years
I'm trying to convert some old code from using sklearn to Keras implementation. Since it is crucial to maintain the same way of operation, I want to understand if I'm doing it correctly.

I've converted most of the code already, however I'm having trouble with sklearn.svm SVC classifier conversion. Here is how it looks right now:
```
from sklearn.svm import SVC
model = SVC(kernel='linear', probability=True)
model.fit(X, Y_labels)
```
Super easy, right. However, I couldn't find the analog of SVC classifier in Keras. So, what I've tried is this:
```
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='squared_hinge',
              optimizer='adadelta',
              metrics=['accuracy'])
model.fit(X, Y_labels)
```
But, I think that it is not correct by any means. Could you, please, help me find an alternative of the SVC classifier from sklearn in Keras?

Thank you.
none32 over 5 years

Thanks a lot, especially for the references, they've helped a lot, not just to get the ready-to-use solution, but to understand what is going under the hood. In my case it's a multi class classifier, so I'm using squared_hinge loss function. As far as I understood, the only difference between my code and that one you are provided is a utilization of regularizer and, by the way, this is the only part that I cannot understand now. I'll dig up more myself, because I'm not familiar with L2 regularizer at all.
Chhaganlaal about 4 years

can you explain why the last dense layer has only 1 node?
Chhaganlaal about 4 years

Also explain W_regularizer as I am getting errors using that.
Nitin1901 over 3 years

When I try a similar code on mnist dataset, it gives very poor results like 10-11% accuracy.
Nitin1901 over 3 years

@Chhaganlaal there is no parameter as such. You can use kernel_regularizer or bias_ instead
anand_v.singh over 3 years

@NitinSai I wrote this answer around 1.5 years ago and I haven't done Machine learning for more than a year now, from the comments it appears that this answer is out of date, If I was to read ML again and fix this it would take a substantial time to relearn, if you are able to suggest an edit to update the answer do so and I will update the answer after verifying.
Apidcloud almost 3 years

do you know of any way of getting a confidence value or outputting probabilities? I detail this question here: stackoverflow.com/questions/67559912/…