How to Interpret Predict Result of SVM in R?

28,511

Solution 1

Since your outcome variable is numeric, it uses the regression formulation of SVM. I think you want the classification formulation. You can change this by either coercing your outcome into a factor, or setting type="C-classification".

Regression:

> model <- svm(vs ~ hp+mpg+gear,data=mtcars)
> predict(model)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
       0.8529506670        0.8529506670        0.9558654451        0.8423224174 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
       0.0747730699        0.6952501964        0.0123405904        0.9966162477 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
       0.9494836511        0.7297563543        0.6909235343       -0.0327165348 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
      -0.0092851098       -0.0504982402        0.0319974842        0.0504292348 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
      -0.0504750284        0.9769206963        0.9724676874        0.9494910097 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
       0.9496260289        0.1349744908        0.1251344111        0.0395243313 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
       0.0983094417        1.0041732099        0.4348209129        0.6349628695 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
       0.0009258333        0.0607896408        0.0507385269        0.8664157985 

Classification:

> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars)
> predict(model)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
                  1                   1                   1                   1 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
                  0                   1                   0                   1 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
                  1                   1                   1                   0 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
                  0                   0                   0                   0 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
                  0                   1                   1                   1 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
                  1                   0                   0                   0 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
                  0                   1                   0                   1 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
                  0                   0                   0                   1 
Levels: 0 1

Also, if you want probabilities as your prediction rather than just the raw classification, you can do that by fitting with the probability option.

With Probabilities:

> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars,probability=TRUE)
> predict(model,mtcars,probability=TRUE)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
                  1                   1                   1                   1 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
                  0                   1                   0                   1 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
                  1                   1                   1                   0 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
                  0                   0                   0                   0 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
                  0                   1                   1                   1 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
                  1                   0                   0                   0 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
                  0                   1                   0                   1 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
                  0                   0                   0                   1 
attr(,"probabilities")
                            0          1
Mazda RX4           0.2393753 0.76062473
Mazda RX4 Wag       0.2393753 0.76062473
Datsun 710          0.1750089 0.82499108
Hornet 4 Drive      0.2370382 0.76296179
Hornet Sportabout   0.8519490 0.14805103
Valiant             0.3696019 0.63039810
Duster 360          0.9236825 0.07631748
Merc 240D           0.1564898 0.84351021
Merc 230            0.1780135 0.82198650
Merc 280            0.3402143 0.65978567
Merc 280C           0.3829336 0.61706640
Merc 450SE          0.9110862 0.08891378
Merc 450SL          0.8979497 0.10205025
Merc 450SLC         0.9223868 0.07761324
Cadillac Fleetwood  0.9187301 0.08126994
Lincoln Continental 0.9153549 0.08464509
Chrysler Imperial   0.9358186 0.06418140
Fiat 128            0.1627969 0.83720313
Honda Civic         0.1649799 0.83502008
Toyota Corolla      0.1781531 0.82184689
Toyota Corona       0.1780519 0.82194807
Dodge Challenger    0.8427087 0.15729129
AMC Javelin         0.8496198 0.15038021
Camaro Z28          0.9190294 0.08097056
Pontiac Firebird    0.8361349 0.16386511
Fiat X1-9           0.1490934 0.85090660
Porsche 914-2       0.5797194 0.42028060
Lotus Europa        0.4169587 0.58304133
Ford Pantera L      0.8731716 0.12682843
Ferrari Dino        0.8392372 0.16076281
Maserati Bora       0.8519422 0.14805785
Volvo 142E          0.2289231 0.77107694

Solution 2

Very broadly speaking with classifiers like this, the predicted value for a binary response variable can be thought of as the probability that that observation belongs to class 1 (in this case your classes are actually labeled 0/1; in other cases you'd need to know which class the function treats as 1 or 0; R often sorts the labels of factors alphabetically and so the last one would be class 1).

So the most common thing people do is use 0.5 as a cutoff. But I should warn you that there is plenty of math behind that decision and the particulars of your modeling circumstances can necessitate a different cutoff value. Using 0.5 as the cutoff is often the best thing to do, but SVMs are fairly complicated beasts; I would recommend that you do some reading on SVMs and classification theory in general before you start trying to apply them to real data.

My favorite reference is The Elements of Statistical Learning, by Hastie, Tibshirani and Friedman.

Share:
28,511
Derrick Zhang
Author by

Derrick Zhang

Recommender Systems, Search Systems, Machine Learning Systems. Contact: zhangxy at live dot com

Updated on August 26, 2022

Comments

  • Derrick Zhang
    Derrick Zhang over 1 year

    I'm new to R and I'm using the e1071 package for SVM classification in R.

    I used the following code:

    data <- loadNumerical()
    
    model <- svm(data[,-ncol(data)], data[,ncol(data)], gamma=10)
    
    print(predict(model, data[c(1:20),-ncol(data)]))
    

    The loadNumerical is for loading data, and the data are of the form(first 8 columns are input and the last column is classification) :

       [,1] [,2] [,3] [,4] [,5] [,6] [,7]      [,8] [,9]
    1    39    1   -1   43   -1    1    0 0.9050497    0
    2    23   -1   -1   30   -1   -1    0 1.6624974    1
    3    50   -1   -1   49    1    1    2 1.5571429    0
    4    46   -1    1   19   -1   -1    0 1.3523685    0
    5    36    1    1   29   -1    1    1 1.3812029    1
    6    27   -1   -1   19    1    1    0 1.9403649    0
    7    36   -1   -1   25   -1    1    0 2.3360004    0
    8    41    1    1   23    1   -1    1 2.4899738    0
    9    21   -1   -1   18    1   -1    2 1.2989637    1
    10   39   -1    1   21   -1   -1    1 1.6121595    0
    

    The number of rows in the data is 500.

    As shown in the code above, I tested the first 20 rows for prediction. And the output is:

             1          2          3          4          5          6          7 
    0.04906014 0.88230392 0.04910760 0.04910719 0.87302217 0.04898187 0.04909523 
             8          9         10         11         12         13         14 
    0.04909199 0.87224979 0.04913189 0.04893709 0.87812890 0.04909588 0.04910999 
            15         16         17         18         19         20 
    0.89837037 0.04903778 0.04914173 0.04897789 0.87572114 0.87001066 
    

    I can tell intuitively from the result that when the result is close to 0, it means 0 class, and if it's close to 1 it's in the 1 class.

    But my question is how can I precisely interpret the result: is there a threshold s I can use so that values below s are classified as 0 and values above s are classified as 1 ?

    If there exists such s, how can I derive it ?