Keras Binary Classification - Sigmoid activation function

python tensorflow neural-network keras sigmoid

14,363

Solution 1

The output of a binary classification is the probability of a sample belonging to a class.

how is Keras distinguishing between the use of sigmoid in a binary classification problem, or a regression problem?

It does not need to. It uses the loss function to calculate the loss, then the derivatives and update the weights.

In other words:

During training the framework minimizes the loss. The user must specify the loss function (provided by the framework) or supply their own. The network only cares about the scalar value this function outputs and its 2 arguments are predicted y^ and actual y.
Each activation function implements the forward propagation and back-propagation functions. The framework is only interested in these 2 functions. It does not care what the function does exactly. The only requirement is that the activation function is non-linear.

Solution 2

You can assign the threshold explicitly in compile() by using

tf.keras.metrics.BinaryAccuracy(
    name="binary_accuracy", dtype=None, threshold=0.5
)

like following:

model.compile(optimizer='sgd',
              loss='mse',
              metrics=[tf.keras.metrics.BinaryAccuracy()])

14,363

Author by

Daniel Whettam

MSc Data Science student at The University of Edinburgh, primarily interested in Machine Learning, Data Science, and their applications Python, C#, C++

Updated on June 03, 2022

Comments

Daniel Whettam about 2 years
I've implemented a basic MLP in Keras with tensorflow and I'm trying to solve a binary classification problem. For binary classification, it seems that sigmoid is the recommended activation function and I'm not quite understanding why, and how Keras deals with this.

I understand the sigmoid function will produce values in a range between 0 and 1. My understanding is that for classification problems using sigmoid, there will be a certain threshold used to determine the class of an input (typically 0.5). In Keras, I'm not seeing any way to specify this threshold, so I assume it's done implicitly in the back-end? If this is the case, how is Keras distinguishing between the use of sigmoid in a binary classification problem, or a regression problem? With binary classification, we want a binary value, but with regression a nominal value is needed. All I can see that could be indicating this is the loss function. Is that informing Keras on how to handle the data?

Additionally, assuming Keras is implicitly applying a threshold, why does it output nominal values when I use my model to predict on new data?

For example:
```
y_pred = model.predict(x_test)
print(y_pred)
```
gives:

[7.4706882e-02] [8.3481872e-01] [2.9314638e-04] [5.2297767e-03] [2.1608515e-01] ... [4.4894204e-03] [5.1120580e-05] [7.0263929e-04]

I can apply a threshold myself when predicting to get a binary output, however surely Keras must be doing that anyway in order to correctly classify? Perhaps Keras is applying a threshold when training the model, but when I use it to predict new values, the threshold isn't used as the loss function isn't used in predicting? Or is not applying a threshold at all, and the nominal values outputted happen to be working well with my model? I've checked this is happening on the Keras example for binary classification, so I don't think I've made any errors with my code, especially as it's predicting accurately.

If anyone could explain how this is working, I would greatly appreciate it.

Here's my model as a point of reference:
```
model = Sequential()
model.add(Dense(124, activation='relu', input_shape = (2,)))
model.add(Dropout(0.5))
model.add(Dense(124, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid'))
model.summary()

model.compile(loss='binary_crossentropy',
              optimizer=SGD(lr = 0.1, momentum = 0.003),
              metrics=['acc'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
```
Daniel Whettam over 6 years

Thank you! That makes a lot of sense. How would the error be calculated then, if it gives you a probability of belonging to a class? You would be comparing a probability to a binary value. Does that work?
Maxim Egorushkin over 6 years

@DanielWhettam For tensorflow Keras back-end see github.com/tensorflow/tensorflow/blob/…
Maxim Egorushkin over 6 years

@DanielWhettam Added a few more details for you.
Geoffrey Anderson almost 6 years

Lets say I need binary outputs. Is a softmax layer a good way to get that?