Difference between Dense and Activation layer in Keras

10,656

Solution 1

Using Dense(activation=softmax) is computationally equivalent to first add Dense and then add Activation(softmax). However there is one advantage of the second approach - you could retrieve the outputs of the last layer (before activation) out of such defined model. In the first approach - it's impossible.

Solution 2

As @MarcinMożejko said, it is equivalent. I just want to explain why. If you look at the Dense Keras documentation page, you'll see that the default activation function is None.

A dense layer mathematically is:

a = g(W.T*a_prev+b)

where g an activation function. When using Dense(units=k, activation=softmax), it is computing all the quantities in one shot. When doing Dense(units=k) and then Activation('softmax), it first calculates the quantity, W.T*a_prev+b (because the default activation function is None) and then applying the activation function specified as input to the Activation layer to the calculated quantity.

Share:
10,656
Pusheen_the_dev
Author by

Pusheen_the_dev

Very interested in neural-nets and generally in machine learning. I know : Python-2 | Python-3 | C | C++ | PHP5

Updated on June 23, 2022

Comments

  • Pusheen_the_dev
    Pusheen_the_dev almost 2 years

    I was wondering what was the difference between Activation Layer and Dense layer in Keras.

    Since Activation Layer seems to be a fully connected layer, and Dense have a parameter to pass an activation function, what is the best practice ?

    Let's imagine a fictionnal network like this : Input -> Dense -> Dropout -> Final Layer Final Layer should be : Dense(activation=softmax) or Activation(softmax) ? What is the cleanest and why ?

    Thanks everyone!