How to use multilayered bidirectional LSTM in Tensorflow?

18,379

Solution 1

You can use two different approaches to apply multilayer bilstm model:

1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And

for n in range(num_layers):
        cell_fw = cell_forw[n]
        cell_bw = cell_back[n]

        state_fw = cell_fw.zero_state(batch_size, tf.float32)
        state_bw = cell_bw.zero_state(batch_size, tf.float32)

        (output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
                                                                             initial_state_fw=state_fw,
                                                                             initial_state_bw=state_bw,
                                                                             scope='BLSTM_'+ str(n),
                                                                             dtype=tf.float32)

        output = tf.concat([output_fw, output_bw], axis=2)

2) Also worth a look at another approach stacked bilstm.

Solution 2

This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.

def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):

    output = input_data
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):

            # By giving a different variable scope to each layer, I've ensured that
            # the weights are not shared among the layers. If you want to share the
            # weights, you can do that by giving variable_scope as "encoder" but do
            # make sure first that reuse is set to tf.AUTO_REUSE

            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)

            outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                              cell_bw, 
                                                              output,
                                                              dtype=tf.float32)

            # Concat the forward and backward outputs
            output = tf.concat(outputs,2)

    return output

Solution 3

On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells

    embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
    embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)

    #First BLSTM
    cell = tf.nn.rnn_cell.GRUCell(state_size)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
    outputs = tf.concat([forward_output, backward_output], axis=2)

    #Second BLSTM using the output of previous layer as an input.
    cell2 = tf.nn.rnn_cell.GRUCell(state_size)
    cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
    outputs = tf.concat([forward_output, backward_output], axis=2)

BTW, don't forget to add different scope name. Hope this help.

Solution 4

As @Taras pointed out, you can use:

(1) tf.nn.bidirectional_dynamic_rnn()

(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().

All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.

Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:

num_layers = 3
num_nodes = 64


# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]

# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)

# Concatenate results
for k in range(num_layers):
    if k == 0:
        con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
    else:
        con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)

output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)

In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

Share:
18,379
Gi Yeon  Shin
Author by

Gi Yeon Shin

Updated on June 16, 2022

Comments

  • Gi Yeon  Shin
    Gi Yeon Shin almost 2 years

    I want to know how to use multilayered bidirectional LSTM in Tensorflow.

    I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.

    How should I add some code in this part?

    x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
    #print(x[0].get_shape())
    
    # Define lstm cells with tensorflow
    # Forward direction cell
    lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Backward direction cell
    lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
    
    # Get lstm cell output
    try:
        outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                              dtype=tf.float32)
    except Exception: # Old TensorFlow version only returns outputs not states
        outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                        dtype=tf.float32)
    
    # Linear activation, using rnn inner loop last output
    outputs = tf.stack(outputs, axis=1)
    outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2))
    outputs = tf.matmul(outputs, weights['out']) + biases['out']
    outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))
    
  • Rahul
    Rahul over 6 years
    I tried this and got this error: ValueError: Variable bidirectional_rnn/fw/lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Can you provide a working example?
  • ARAT
    ARAT over 5 years
    I have a question related to that. I concat the outputs and reshaped it using output = tf.reshape(tf.concat(output,1), [-1, 2 * rnn_size]) and the dimension is now (Batch_size X timesteps, 2*rnn_size). When I pass it through a dense layer by using logits=tf.matmul(output, weight) + bias, my dimension becomes (Batch_size X timesteps, num_classes). These are my logits. How can I then find loss by using tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit‌​s=logits, labels=Y))? cause the shape of Y placeholder is [None, num_classes].
  • betelgeuse
    betelgeuse over 5 years
    You can't directly. You need to eliminate that timestep dimension. Is there any specific reason to use the output of all timesteps? Generally, we take output at the last time step only. You can do this by returning output = output[:,-1,:]. Now logits would be [batch_size,num_classes]
  • ARAT
    ARAT over 5 years
    thank you very much for your quick response. To be honest, this is how I learned LSTM. Like in this example they flatten the output and use it to compute logits, not eliminating the timesteps. I am confused a bit now.
  • betelgeuse
    betelgeuse over 5 years
    He did that because he's using tf.contrib.seq2seq.sequence_loss which expects the time_step dimension. Notice that once logits are calculated, he again reshaped it to original shape. In your case, you want to use tf.nn.softmax_cross_entropy_with_logits which won't take that shape. It will require the last time_step only.
  • ARAT
    ARAT over 5 years
    Oh I understand.so you say, before dense layer and softwax, I should choose the last time steps of data points and go from there?
  • betelgeuse
    betelgeuse over 5 years
    If you want to use tf.nn.softmax_cross_entropy_with_logits then yes. Though in that particular problem, you might want to use seq2seq loss.
  • ARAT
    ARAT over 5 years
    for the one in the link? yeah i understand.
  • Jaeyoung Lee
    Jaeyoung Lee over 3 years
    Thank you for the detailed explanation. Can I ask about the "final states"? When the input sequences have different length, does "final states" have actual final states corresponding to each different length input? or it may include zero paddings?
  • dopexxx
    dopexxx over 3 years
    this code snippet was done for tf==1.X and if I remember correctly it can't handle variable length sequences out of the box. I always used zero-padding. Tensorflow 2.X may have a better solution for this though