SVM and Neural Network

25,788

Solution 1

There are two parts to this question. The first part is "what is the form of function learned by these methods?" For NN and SVM this is typically the same. For example, a single hidden layer neural network uses exactly the same form of model as an SVM. That is:

Given an input vector x, the output is: output(x) = sum_over_all_i weight_i * nonlinear_function_i(x)

Generally the nonlinear functions will also have some parameters. So these methods need to learn how many nonlinear functions should be used, what their parameters are, and what the value of all the weight_i weights should be.

Therefore, the difference between a SVM and a NN is in how they decide what these parameters should be set to. Usually when someone says they are using a neural network they mean they are trying to find the parameters which minimize the mean squared prediction error with respect to a set of training examples. They will also almost always be using the stochastic gradient descent optimization algorithm to do this. SVM's on the other hand try to minimize both training error and some measure of "hypothesis complexity". So they will find a set of parameters that fits the data but also is "simple" in some sense. You can think of it like Occam's razor for machine learning. The most common optimization algorithm used with SVMs is sequential minimal optimization.

Another big difference between the two methods is that stochastic gradient descent isn't guaranteed to find the optimal set of parameters when used the way NN implementations employ it. However, any decent SVM implementation is going to find the optimal set of parameters. People like to say that neural networks get stuck in a local minima while SVMs don't.

Solution 2

NNs are heuristic, while SVMs are theoretically founded. A SVM is guaranteed to converge towards the best solution in the PAC (probably approximately correct) sense. For example, for two linearly separable classes SVM will draw the separating hyperplane directly halfway between the nearest points of the two classes (these become support vectors). A neural network would draw any line which separates the samples, which is correct for the training set, but might not have the best generalization properties.

So no, even for linearly separable problems NNs and SVMs are not same.

In case of linearly non-separable classes, both SVMs and NNs apply non-linear projection into higher-dimensional space. In the case of NNs this is achieved by introducing additional neurons in the hidden layer(s). For SVMs, a kernel function is used to the same effect. A neat property of the kernel function is that the computational complexity doesn't rise with the number of dimensions, while for NNs it obviously rises with the number of neurons.

Solution 3

Running a simple out-of-the-box comparison between support vector machines and neural networks (WITHOUT any parameter-selection) on several popular regression and classification datasets demonstrates the practical differences: an SVM becomes a very slow predictor if many support vectors are being created while a neural network's prediction speed is much higher and model-size much smaller. On the other hand, the training time is much shorter for SVMs. Concerning the accuracy/loss - despite the aforementioned theoretical drawbacks of neural networks - both methods are on par - especially for regression problems, neural networks often outperform support vector machines. Depending on your specific problem, this might help to choose the right model.

Solution 4

SVMs and NNs have the same building block as perceptrons, but SVMs also uses a kernel trick to raise dimension from say 2 to 3d by translation such as Y = (x1,2,..^2, y1,2...^2) which can separate linearly inseparable plains using a straight line. Want a demo like it and ask me :)

Solution 5

Both Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are supervised machine learning classifiers. An ANN is a parametric classifier that uses hyper-parameters tuning during the training phase. An SVM is a non-parametric classifier that finds a linear vector (if a linear kernel is used) to separate classes. Actually, in terms of the model performance, SVMs are sometimes equivalent to a shallow neural network architecture. Generally, an ANN will outperform an SVM when there is a large number of training instances, however, neither outperforms the other over the full range of problems.

We can summarize the advantages of the ANN over the SVM as follows: ANNs can handle multi-class problems by producing probabilities for each class. In contrast, SVMs handle these problems using independent one-versus-all classifiers where each produces a single binary output. For example, a single ANN can be trained to solve the hand-written digits problem while 10 SVMs (one for each digit) are required.

Another advantage of ANNs, from the perspective of model size, is that the model is fixed in terms of its inputs nodes, hidden layers, and output nodes; in an SVM, however, the number of support vector lines could reach the number of instances in the worst case.

The SVM does not perform well when the number of features is greater than the number of samples. More work in feature engineering is required for an SVM than that needed for a multi-layer Neural Network.

On the other hand, SVMs are better than ANNs in certain respects:

In comparison to SVMs, ANNs are more prone to becoming trapped in local minima, meaning that they sometime miss the global picture.

While most machine learning algorithms can overfit if they don’t have enough training samples, ANNs can also overfit if training goes on for too long - a problem that SVMs do not have.

SVM models are easier to understand. There are different kernels that provide a different level of flexibilities beyond the classical linear kernel, such as the Radial Basis Function kernel (RBF). Unlike the linear kernel, the RBF can handle the case when the relation between class labels and attributes is nonlinear.

Share:
25,788
CoyBit
Author by

CoyBit

Currently, I'm an iOS developer who sometimes writes a piece of code in python, js, and C#.

Updated on May 18, 2020

Comments

  • CoyBit
    CoyBit almost 4 years

    What is difference between SVM and Neural Network? Is it true that linear svm is same NN, and for non-linear separable problems, NN uses adding hidden layers and SVM uses changing space dimensions?

  • user492238
    user492238 about 12 years
    Could you elaborate a little more about the other part of the question regarding the non-linear seperable problems?
  • Fluchtpunkt
    Fluchtpunkt about 12 years
    In general, both - SVM and NN - can solve non-linear problems. The "degree of non-linearity" is controlled via #hidden-nodes (or layers) in NN and #support-vectors in SVM. The SVM adjusts this automatically during training while for NN the developer has to define the #hidden-units/topology (although there exist several more or less useful heuristics for automatically determining the optimal topology, the best way is to perform parameter selection via cross-validation)
  • Phob
    Phob over 10 years
    Training time isn't necessarily slower for a NN: consider using a very large dataset with n > 10^6 data points, using a cluster to train some sort of system over a period of weeks. A NN can be trained with batch gradient descent, which is O(n). SVM training algorithms are O(n^2) which is unacceptable for such a large dataset.
  • max
    max about 10 years
  • a-Jays
    a-Jays almost 10 years
    Does the complexity not rise only in the learning stage, or is the statement valid for prediction stage of either?
  • Igor F.
    Igor F. almost 10 years
    The statement holds also for the prediction stage. You are basically doing the scalar product between the separation hyperplane's normal vector and the vector you want to classify in the high dimensional space. But, instead of doing it explicitly, you rely on the kernel function, like in the learning stage.
  • a-Jays
    a-Jays almost 10 years
    And what about NNs? Does it rise with the number of neurons (in the prediction stage, of course)?
  • Igor F.
    Igor F. almost 10 years
    Yes, of course. You have to propagate the vector you want to classify through all neurons and over all connections between them.
  • LandonZeKepitelOfGreytBritn
    LandonZeKepitelOfGreytBritn over 6 years
    Based on the explanation you gave, it sounds to me like SVM is usually better for linear problems. Because you say that SVM will converge towards a PAC (thanks to the way it works), while NN may not immediately give the best separating hyperplane and may therefor more iterations using backpropagation. So SVM may need less time and autotuning ( if I am correct this means SVM's training time for linear problems is lower). Right? feel free to correct me if I m wrong.
  • Igor F.
    Igor F. over 6 years
    @ trilolil: I'm not sure whether I understand your argument. Generally, SVMs don't care (much) whether your problem is linearly separable or not. With a suitable kernel you can always transform it into a linearly separable one. A soft-margin SVM will work even for non-separable ones.
  • serv-inc
    serv-inc over 6 years
    "they are exactly equivalent to each other" Could you post some reference for that?
  • ZakC
    ZakC over 3 years
    Could you expand on "SVM models are easier to understand" ? I have seen this sentence in a lot of places and I wonder according to what measure are you justifying "understandability" ?
  • Cole
    Cole about 3 years
    I'm not sure I agree with the assertion that SVMs are more PAC. The correctness of the separating hyperplane defined by the support vectors is entirely determined by the training data. In the case where the nearest points between the two classes in the training data are not very representative of the true classes, the boundary drawn by the NN may generalize better than that of the SVM.
  • Dikran Marsupial
    Dikran Marsupial almost 3 years
    A lot of the theory underpinning SVMs assumes a fixed kernel, and as soon as you tune the kernel (e.g. using cross-validation) the theoretical results have been invalidated. Note also a regularised neural network (particularly a linear one) will generate something like a maximum margin classifier, and there are theoretical results for the generalisation of such models as well. So it isn't true to say NN are heuristic and SVMs are theoretically founded, they tend to both have heuristic and theoretically founded aspects.