How to use multilayered bidirectional LSTM in Tensorflow?
Solution 1
You can use two different approaches to apply multilayer bilstm model:
1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And
for n in range(num_layers):
cell_fw = cell_forw[n]
cell_bw = cell_back[n]
state_fw = cell_fw.zero_state(batch_size, tf.float32)
state_bw = cell_bw.zero_state(batch_size, tf.float32)
(output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
initial_state_fw=state_fw,
initial_state_bw=state_bw,
scope='BLSTM_'+ str(n),
dtype=tf.float32)
output = tf.concat([output_fw, output_bw], axis=2)
2) Also worth a look at another approach stacked bilstm.
Solution 2
This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.
def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):
output = input_data
for layer in range(num_layers):
with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):
# By giving a different variable scope to each layer, I've ensured that
# the weights are not shared among the layers. If you want to share the
# weights, you can do that by giving variable_scope as "encoder" but do
# make sure first that reuse is set to tf.AUTO_REUSE
cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)
cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw,
cell_bw,
output,
dtype=tf.float32)
# Concat the forward and backward outputs
output = tf.concat(outputs,2)
return output
Solution 3
On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells
embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)
#First BLSTM
cell = tf.nn.rnn_cell.GRUCell(state_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
(forward_output, backward_output), _ = \
tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
outputs = tf.concat([forward_output, backward_output], axis=2)
#Second BLSTM using the output of previous layer as an input.
cell2 = tf.nn.rnn_cell.GRUCell(state_size)
cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
(forward_output, backward_output), _ = \
tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
outputs = tf.concat([forward_output, backward_output], axis=2)
BTW, don't forget to add different scope name. Hope this help.
Solution 4
As @Taras pointed out, you can use:
(1) tf.nn.bidirectional_dynamic_rnn()
(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn()
.
All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.
Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:
num_layers = 3
num_nodes = 64
# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]
# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)
# Concatenate results
for k in range(num_layers):
if k == 0:
con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
else:
con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)
output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)
In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states
), since I was using an encoding decoding scheme, where the above code was only the encoder.
Gi Yeon Shin
Updated on June 16, 2022Comments
-
Gi Yeon Shin almost 2 years
I want to know how to use multilayered bidirectional LSTM in Tensorflow.
I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.
How should I add some code in this part?
x = tf.unstack(tf.transpose(x, perm=[1, 0, 2])) #print(x[0].get_shape()) # Define lstm cells with tensorflow # Forward direction cell lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0) # Backward direction cell lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0) # Get lstm cell output try: outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32) except Exception: # Old TensorFlow version only returns outputs not states outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32) # Linear activation, using rnn inner loop last output outputs = tf.stack(outputs, axis=1) outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2)) outputs = tf.matmul(outputs, weights['out']) + biases['out'] outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))
-
Rahul over 6 yearsI tried this and got this error: ValueError: Variable bidirectional_rnn/fw/lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Can you provide a working example?
-
ARAT over 5 yearsI have a question related to that. I concat the outputs and reshaped it using
output = tf.reshape(tf.concat(output,1), [-1, 2 * rnn_size])
and the dimension is now (Batch_size X timesteps, 2*rnn_size). When I pass it through a dense layer by usinglogits=tf.matmul(output, weight) + bias
, my dimension becomes (Batch_size X timesteps, num_classes). These are my logits. How can I then find loss by usingtf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
? cause the shape of Y placeholder is [None, num_classes]. -
betelgeuse over 5 yearsYou can't directly. You need to eliminate that timestep dimension. Is there any specific reason to use the output of all timesteps? Generally, we take output at the last time step only. You can do this by returning
output = output[:,-1,:]
. Now logits would be[batch_size,num_classes]
-
ARAT over 5 yearsthank you very much for your quick response. To be honest, this is how I learned LSTM. Like in this example they flatten the output and use it to compute logits, not eliminating the timesteps. I am confused a bit now.
-
betelgeuse over 5 yearsHe did that because he's using
tf.contrib.seq2seq.sequence_loss
which expects the time_step dimension. Notice that once logits are calculated, he again reshaped it to original shape. In your case, you want to usetf.nn.softmax_cross_entropy_with_logits
which won't take that shape. It will require the last time_step only. -
ARAT over 5 yearsOh I understand.so you say, before dense layer and softwax, I should choose the last time steps of data points and go from there?
-
betelgeuse over 5 yearsIf you want to use
tf.nn.softmax_cross_entropy_with_logits
then yes. Though in that particular problem, you might want to useseq2seq
loss. -
ARAT over 5 yearsfor the one in the link? yeah i understand.
-
Jaeyoung Lee over 3 yearsThank you for the detailed explanation. Can I ask about the "final states"? When the input sequences have different length, does "final states" have actual final states corresponding to each different length input? or it may include zero paddings?
-
dopexxx over 3 yearsthis code snippet was done for
tf==1.X
and if I remember correctly it can't handle variable length sequences out of the box. I always used zero-padding. Tensorflow 2.X may have a better solution for this though