How to use multilayered bidirectional LSTM in Tensorflow?

tensorflow lstm recurrent-neural-network bidirectional multi-layer

18,379

Solution 1

You can use two different approaches to apply multilayer bilstm model:

1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And

for n in range(num_layers):
        cell_fw = cell_forw[n]
        cell_bw = cell_back[n]

        state_fw = cell_fw.zero_state(batch_size, tf.float32)
        state_bw = cell_bw.zero_state(batch_size, tf.float32)

        (output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
                                                                             initial_state_fw=state_fw,
                                                                             initial_state_bw=state_bw,
                                                                             scope='BLSTM_'+ str(n),
                                                                             dtype=tf.float32)

        output = tf.concat([output_fw, output_bw], axis=2)

2) Also worth a look at another approach stacked bilstm.

Solution 2

This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.

def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):

    output = input_data
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):

            # By giving a different variable scope to each layer, I've ensured that
            # the weights are not shared among the layers. If you want to share the
            # weights, you can do that by giving variable_scope as "encoder" but do
            # make sure first that reuse is set to tf.AUTO_REUSE

            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)

            outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                              cell_bw, 
                                                              output,
                                                              dtype=tf.float32)

            # Concat the forward and backward outputs
            output = tf.concat(outputs,2)

    return output

Solution 3

On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells

    embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
    embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)

    #First BLSTM
    cell = tf.nn.rnn_cell.GRUCell(state_size)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
    outputs = tf.concat([forward_output, backward_output], axis=2)

    #Second BLSTM using the output of previous layer as an input.
    cell2 = tf.nn.rnn_cell.GRUCell(state_size)
    cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
    outputs = tf.concat([forward_output, backward_output], axis=2)

BTW, don't forget to add different scope name. Hope this help.

Solution 4

As @Taras pointed out, you can use:

(1) tf.nn.bidirectional_dynamic_rnn()

(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().

All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.

Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:

num_layers = 3
num_nodes = 64


# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]

# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)

# Concatenate results
for k in range(num_layers):
    if k == 0:
        con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
    else:
        con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)

output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)

In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

View more solutions

18,379

Author by

Gi Yeon Shin

Updated on June 16, 2022

Comments

Gi Yeon Shin almost 2 years

I want to know how to use multilayered bidirectional LSTM in Tensorflow.

I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.

How should I add some code in this part?

x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
#print(x[0].get_shape())

# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

# Get lstm cell output
try:
    outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                          dtype=tf.float32)
except Exception: # Old TensorFlow version only returns outputs not states
    outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                    dtype=tf.float32)

# Linear activation, using rnn inner loop last output
outputs = tf.stack(outputs, axis=1)
outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2))
outputs = tf.matmul(outputs, weights['out']) + biases['out']
outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))

Rahul over 6 years

I tried this and got this error: ValueError: Variable bidirectional_rnn/fw/lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Can you provide a working example?
ARAT over 5 years

I have a question related to that. I concat the outputs and reshaped it using output = tf.reshape(tf.concat(output,1), [-1, 2 * rnn_size]) and the dimension is now (Batch_size X timesteps, 2*rnn_size). When I pass it through a dense layer by using logits=tf.matmul(output, weight) + bias, my dimension becomes (Batch_size X timesteps, num_classes). These are my logits. How can I then find loss by using tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit‌s=logits, labels=Y))? cause the shape of Y placeholder is [None, num_classes].
betelgeuse over 5 years

You can't directly. You need to eliminate that timestep dimension. Is there any specific reason to use the output of all timesteps? Generally, we take output at the last time step only. You can do this by returning output = output[:,-1,:]. Now logits would be [batch_size,num_classes]
ARAT over 5 years

thank you very much for your quick response. To be honest, this is how I learned LSTM. Like in this example they flatten the output and use it to compute logits, not eliminating the timesteps. I am confused a bit now.
betelgeuse over 5 years

He did that because he's using tf.contrib.seq2seq.sequence_loss which expects the time_step dimension. Notice that once logits are calculated, he again reshaped it to original shape. In your case, you want to use tf.nn.softmax_cross_entropy_with_logits which won't take that shape. It will require the last time_step only.
ARAT over 5 years

Oh I understand.so you say, before dense layer and softwax, I should choose the last time steps of data points and go from there?
betelgeuse over 5 years

If you want to use tf.nn.softmax_cross_entropy_with_logits then yes. Though in that particular problem, you might want to use seq2seq loss.
ARAT over 5 years

for the one in the link? yeah i understand.
Jaeyoung Lee over 3 years

Thank you for the detailed explanation. Can I ask about the "final states"? When the input sequences have different length, does "final states" have actual final states corresponding to each different length input? or it may include zero paddings?
dopexxx over 3 years

this code snippet was done for tf==1.X and if I remember correctly it can't handle variable length sequences out of the box. I always used zero-padding. Tensorflow 2.X may have a better solution for this though