What is the role of TimeDistributed layer in Keras?
In keras
- while building a sequential model - usually the second dimension (one after sample dimension) - is related to a time
dimension. This means that if for example, your data is 5-dim
with (sample, time, width, length, channel)
you could apply a convolutional layer using TimeDistributed
(which is applicable to 4-dim
with (sample, width, length, channel)
) along a time dimension (applying the same layer to each time slice) in order to obtain 5-d
output.
The case with Dense
is that in keras
from version 2.0 Dense
is by default applied to only last dimension (e.g. if you apply Dense(10)
to input with shape (n, m, o, p)
you'll get output with shape (n, m, o, 10)
) so in your case Dense
and TimeDistributed(Dense)
are equivalent.
Related videos on Youtube
Buomsoo Kim
Updated on July 08, 2022Comments
-
Buomsoo Kim almost 2 years
I am trying to grasp what TimeDistributed wrapper does in Keras.
I get that TimeDistributed "applies a layer to every temporal slice of an input."
But I did some experiment and got the results that I cannot understand.
In short, in connection to LSTM layer, TimeDistributed and just Dense layer bear same results.
model = Sequential() model.add(LSTM(5, input_shape = (10, 20), return_sequences = True)) model.add(TimeDistributed(Dense(1))) print(model.output_shape) model = Sequential() model.add(LSTM(5, input_shape = (10, 20), return_sequences = True)) model.add((Dense(1))) print(model.output_shape)
For both models, I got output shape of (None, 10, 1).
Can anyone explain the difference between TimeDistributed and Dense layer after an RNN layer?
-
gionni over 6 yearsThere currently ssem to be no difference, here a discussion about it. I think the original intent was to make a distinction between the
Dense
layer flattening the input and then reshaping, hence connecting different time steps and having more parameters, andTimeDistributed
keeping the time steps separated (hence having less parameters). In your caseDense
should have had 500 paramters,TimeDistributed
only 50 -
Buomsoo Kim over 6 years@gionni Nope, it has same number of parameters (both 6). So there is virtually no difference atm?
-
gionni over 6 yearsYeah exactly, those are the number of parameters they would have if there was a difference. At the moment there isn't
-
-
CMCDragonkai almost 6 yearsThere's an example of using TimeDistributed wrapping the model itself. When this is applied to an
Input
tensor, is there any difference from this compared to just doing amap
of the model applied to a list that contains each slice of theInput
?