Understanding a simple LSTM pytorch

31,829

Solution 1

The output for the LSTM is the output for all the hidden nodes on the final layer.
hidden_size - the number of LSTM blocks per layer.
input_size - the number of input features per time-step.
num_layers - the number of hidden layers.
In total there are hidden_size * num_layers LSTM blocks.

The input dimensions are (seq_len, batch, input_size).
seq_len - the number of time steps in each input stream.
batch - the size of each batch of input sequences.

The hidden and cell dimensions are: (num_layers, batch, hidden_size)

output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.

So there will be hidden_size * num_directions outputs. You didn't initialise the RNN to be bidirectional so num_directions is 1. So output_size = hidden_size.

Edit: You can change the number of outputs by using a linear layer:

out_rnn, hn = rnn(input, (h0, c0))
lin = nn.Linear(hidden_size, output_size)
v1 = nn.View(seq_len*batch, hidden_size)
v2 = nn.View(seq_len, batch, output_size)
output = v2(lin(v1(out_rnn)))

Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.

Source: PyTorch docs.

Solution 2

Answer by cdo256 is almost correct. He is mistaken when referring to what hidden_size means. He explains it as:

hidden_size - the number of LSTM blocks per layer.

but really, here is a better explanation:

Each sigmoid, tanh or hidden state layer in the cell is actually a set of nodes, whose number is equal to the hidden layer size. Therefore each of the “nodes” in the LSTM cell is actually a cluster of normal neural network nodes, as in each layer of a densely connected neural network. Hence, if you set hidden_size = 10, then each one of your LSTM blocks, or cells, will have neural networks with 10 nodes in them. The total number of LSTM blocks in your LSTM model will be equivalent to that of your sequence length.

This can be seen by analyzing the differences in examples between nn.LSTM and nn.LSTMCell:

https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM

and

https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell

Solution 3

You can set

batch_first = True

if you want to make input and output provided as

(batch_size, seq, input_size)

I got to know it today, so sharing with you.

Share:
31,829
Abhishek Bhatia
Author by

Abhishek Bhatia

"The purpose of computing is insight, not numbers."- Richard Hamming, 1961 Abhishek has had a unique interdisciplinary research exposure to AI systems. His projects range from designing artificially intelligent autonomous systems that operate in varied setups, all the way to studying common emergent phenomena in natural systems. He has published 5 research papers in the field of complex systems, artificial intelligence and statistical inference. He is currently working on Deep Reinforcement Learning applications for Natural Language Processing and General-game Playing. He is also enthusiastic about open-source tools and frequently contributes to many open-source projects.

Updated on December 21, 2020

Comments

  • Abhishek Bhatia
    Abhishek Bhatia over 3 years
    import torch,ipdb
    import torch.autograd as autograd
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    from torch.autograd import Variable
    
    rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
    input = Variable(torch.randn(5, 3, 10))
    h0 = Variable(torch.randn(2, 3, 20))
    c0 = Variable(torch.randn(2, 3, 20))
    output, hn = rnn(input, (h0, c0))
    

    This is the LSTM example from the docs. I don't know understand the following things:

    1. What is output-size and why is it not specified anywhere?
    2. Why does the input have 3 dimensions. What does 5 and 3 represent?
    3. What are 2 and 3 in h0 and c0, what do those represent?

    Edit:

    import torch,ipdb
    import torch.autograd as autograd
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    from torch.autograd import Variable
    import torch.nn.functional as F
    
    num_layers=3
    num_hyperparams=4
    batch = 1
    hidden_size = 20
    rnn = nn.LSTM(input_size=num_hyperparams, hidden_size=hidden_size, num_layers=num_layers)
    
    input = Variable(torch.randn(1, batch, num_hyperparams)) # (seq_len, batch, input_size)
    h0 = Variable(torch.randn(num_layers, batch, hidden_size)) # (num_layers, batch, hidden_size)
    c0 = Variable(torch.randn(num_layers, batch, hidden_size))
    output, hn = rnn(input, (h0, c0))
    affine1 = nn.Linear(hidden_size, num_hyperparams)
    
    ipdb.set_trace()
    print output.size()
    print h0.size()
    

    *** RuntimeError: matrices expected, got 3D, 2D tensors at

  • Abhishek Bhatia
    Abhishek Bhatia almost 7 years
    What do you mean by a LSTM block here. Is it a single neuron with output connection to next layer and hidden connection to itself?
  • Abhishek Bhatia
    Abhishek Bhatia almost 7 years
    what is num_directions?
  • cdo256
    cdo256 almost 7 years
    LSTM block is one of these things. num_directions is just a value that indicates whether the LSTM is bidirectional (either 1 or 2). In most cases it will be 1.
  • Abhishek Bhatia
    Abhishek Bhatia almost 7 years
    Is it possible to different output size, i.e., different number of output units in the last layer. Say, I using classification of different size output.
  • Abhishek Bhatia
    Abhishek Bhatia almost 7 years
    This doesn't work, it shows me *** RuntimeError: matrices expected, got 3D, 2D tensors.
  • cdo256
    cdo256 almost 7 years
    Yeah, that was my mistake. I think it's fixed now.
  • WillZ
    WillZ over 6 years
    Hi what does seq_len and batch represent? could you pls refer back to the original 5x3x10 example? does it mean there are 5 batches, each batch is 3x10 in dimension? thanks.
  • Vipin Chaudhary
    Vipin Chaudhary over 6 years
    @cdo256 i have a little confusion in input dimensions , My input is like this , i have 3 batches and each batch have 10 frames and each frame 1000 features , so my input size is (3, 10, 1000) but you mentioned that input is of dimension (seq_len, batch, input_size), here input_size =1000 so can you please tell me how do i convert my input into the required one ?
  • jonnyd42
    jonnyd42 over 6 years
    How come there is no training step? @cdo256
  • peer
    peer over 4 years
    @cdo256 your link is dead.