Does dropout layer go before or after dense layer in TensorFlow?

tensorflow machine-learning neural-network deep-learning conv-neural-network

12,343

It is not an either/or situation. Informally speaking, common wisdom says to apply dropout after dense layers, and not so much after convolutional or pooling ones, so at first glance that would depend on what exactly the prev_layer is in your second code snippet.

Nevertheless, this "design principle" is routinely violated nowadays (see some interesting relevant discussions in Reddit & CrossValidated); even in the MNIST CNN example included in Keras, we can see that dropout is applied both after the max pooling layer and after the dense one:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # <-- dropout here
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))  # <-- and here
model.add(Dense(num_classes, activation='softmax'))

So, both your code snippets are valid, and we can easily imagine a third valid option as well:

dropout = tf.layers.dropout(prev_layer, [...])
dense = tf.layers.dense(dropout, units=1024, activation=tf.nn.relu)
dropout2 = tf.layers.dropout(dense, [...])
logits = tf.layers.dense(dropout2, units=params['output_classes'])

As a general advice: tutorials such the one you link to are only trying to get you familiar with the tools and the (very) general principles, so "overinterpreting" the solutions shown is not recommended...

12,343

rodrigo-silveira

Hybrid Software Engineer/Data Scientist I was born and raised a Software Engineer spending most of my career as a full-stack web developer and Android app developer, but have since converted into a Data Scientist slash Machine Learning Engineer. Websites http://www.rodrigo-silveira.com

Updated on May 22, 2022

Comments

rodrigo-silveira almost 2 years

According to A Guide to TF Layers the dropout layer goes after the last dense layer:

dense = tf.layers.dense(input, units=1024, activation=tf.nn.relu)
dropout = tf.layers.dropout(dense, rate=params['dropout_rate'], 
                            training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(dropout, units=params['output_classes'])

Doesn't it make more sense to have it before that dense layer, so it learns the mapping from input to output with the dropout effect?

dropout = tf.layers.dropout(prev_layer, rate=params['dropout_rate'], 
                            training=mode == 
dense = tf.layers.dense(dropout, units=1024, activation=tf.nn.relu)
logits = tf.layers.dense(dense, units=params['output_classes'])