softmax and sigmoid function for the output layer

19,980

Solution 1

softmax() helps when you want a probability distribution, which sums up to 1. sigmoid is used when you want the output to be ranging from 0 to 1, but need not sum to 1.

In your case, you wish to classify and choose between two alternatives. I would recommend using softmax() as you will get a probability distribution which you can apply cross entropy loss function on.

Solution 2

The sigmoid and the softmax function have different purposes. For a detailed explanation of when to use sigmoid vs. softmax in neural network design, you can look at this article: "Classification: Sigmoid vs. Softmax."

Short summary:

If you have a multi-label classification problem where there is more than one "right answer" (the outputs are NOT mutually exclusive) then you can use a sigmoid function on each raw output independently. The sigmoid will allow you to have high probability for all of your classes, some of them, or none of them.

If you instead have a multi-class classification problem where there is only one "right answer" (the outputs are mutually exclusive), then use a softmax function. The softmax will enforce that the sum of the probabilities of your output classes are equal to one, so in order to increase the probability of a particular class, your model must correspondingly decrease the probability of at least one of the other classes.

Solution 3

Object detection is object classification used on a sliding window in the image. In classification it is important to find the correct output in some class space. E.g. you detect 10 different objects and you want to know which object is the most likely one in there. Then softmax is good because of its proberty that the whole layer sums up to 1.

Semantic segmentation on the other hand segments the image in some way. I have done semantic medical segmentation and there the output is a binary image. This means you can have sigmoid as output to predict if this pixel belongs to this specific class, because sigmoid values are between 0 and 1 for each output class.

Solution 4

In general Softmax is used (Softmax Classifier) when ‘n’ number of classes are there. Sigmoid or softmax both can be used for binary (n=2) classification.

Sigmoid: S(x) = 1/ ( 1+ ( e^(-x) ))

Softmax:

         σ(x)j = e    /  **Σ**{k=1 to K} e^zk    for(j=1.....K)

Softmax is kind of Multi Class Sigmoid, but if you see the function of Softmax, the sum of all softmax units are supposed to be 1. In sigmoid it’s not really necessary.

Digging deep, you can also use sigmoid for multi-class classification. When you use a softmax, basically you get a probability of each class, (join distribution and a multinomial likelihood) whose sum is bound to be one. In case you use sigmoid for multi class classification, it’d be like a marginal distribution and a Bernoulli likelihood, p(y0/x) , p(y1/x) etc

Share:
19,980

Related videos on Youtube

user288609
Author by

user288609

Updated on June 04, 2022

Comments

  • user288609
    user288609 almost 2 years

    In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?

  • user288609
    user288609 over 7 years
    Hi, I think my question in pretty generic. When I read some papers or deep learning implementations, I found that authors either use sigmoid or softmax. I do not see the explanations related to the logic underlying this kind of choice. For instance, if we are working on a sematic segmentation problem, each pixel should be predicted as class 1 or class 2. (suppose this semantic segmentation is a two-class labeling). Then I think both softmax and sigmoid can be used. But which one is better or which one should be given more preferences.
  • Kulbear
    Kulbear almost 6 years
    Isn't this answer copied from this quora post without reference?quora.com/…
  • Anno
    Anno over 3 years
    A bit late, but I think I should answer that comment. In semantic segmentation it is important if a pixel can belong to exactly one of the two classes or to both classes at the same time. If it could belong to both classes you want to use sigmoid, because it does not care for dependencies between the classes. If the pixel can belong to exactly one of the two classes you want to use softmax, bacause it says which class is more likely for that pixel.